Search This Blog

Saturday, January 4, 2020

At Last a Black Bear Genome in GenBank and it Matches Sample 26 100%ID

If it were a black bear it would have matched 100%.”  Melba Ketchum in Die Tiefe

Finally, in 2018, a complete black bear genome was sequenced and submitted to GenBank as accession ASM334442v1 in the “Assembly” database.  I had been checking regularly with NCBI to see whether a black bear whole genome had been submitted.  I was thrilled.  It would now be possible to query the Sample 26 sequence against this massive genome of 2.5 billion bases, in 231,673 contigs and 111,495 scaffolds.


The top six hits by score were (Table 1):


Table 1.  Sample 26 vs. Black Bear Whole Genome

Accession#
%ID
Length
Score

LZNR01001132.1
100
1237
2285
LZNR01003708.1
100
1157
2137
LZNR01005486.1
100
1114
2058
LZNR01000516.1
100
1103
2037
LZNR01074398.1
100
1093
2019
LZNR01003247.1
100
1043
1927

Table 1.  #These are scaffolds.

There were 1725 hits above 200 bases in length with 100%ID. This is an excellent match, even better than the dog in Sample 140. Based on previous matches in other databases, I was not surprised at this result, but it was gratifying. As mentioned previously, GenBank is a moving target. Fortunately, it gets bigger with time. With the rapid advances in sequencing technology, the databases are growing exponentially. If you don’t find a species that you want or to the extent that you want, just wait awhile (Five years in this case).

Fig. 1 shows the plot of all hits above 99.5%ID and 200 bp. As mentioned above, the solid line at 100%ID contains 1725 hits. There are 849 other hits above 99.5%ID in this figure.

Figure 1.  Each diamond represents a hit > 200bp and 99.5%ID.  Abscissa is hit starting point on Sample 26 sequence.  Most of the data are in the 100%ID solid line of overlapping points.

And Melba, bears have mutations too, so 100%ID is not required for a species match.  Depending on the phylogeny, above 99-99.5% is considered a good species match. The whole system of human haplogroups is based on mutations and slight mismatches of mtDNA.  We don't always match each other 100% either.