Search This Blog

Thursday, June 4, 2015

RNA Data Show Ketchum Sample 26 - the Smeja Kill - is a Black Bear




ABSTRACT


The Ketchum Sample 26 nDNA sequence (2,726,786 bp long) was searched in total against the RNA reference genomic sequence (RNA ref_seq) database of the National Center for Biotechnology Information (NCBI), commonly known as GenBank.  The matches to polar bear were far better than any other species.  Primates, including human, were far poorer matches than the next best non bear matches.   A phylotree showed Sample 26 in precisely the position that a black bear would occupy.  Since Sample 26 is from California and since there are no black bear data in this database, the polar bear is the surrogate match from the same genus.  Sample 26 is a black bear. 


INTRODUCTION



The conclusion of the Ketchum paper, “Novel North American Hominins…..” [1], unsupported by any published sequence comparisons, was that:


“…the species (sasquatch) possesses a novel mosaic pattern of nuclear DNA comprising novel sequences that are related to primates interspersed with sequences that are closely homologous to humans.” 


However, we showed previously, with multiple sequence comparisons, that Sample 26 (S26) – the Smeja kill – is a bear, most likely a black bear (Ursus americanus)[2], in agreement with three separate laboratory analyses [3], and that Sample 140 is a dog [2]. Only Sample 31 matched a human best [2]. Primates, including humans, were not even close to matching Samples 26 and 140. Reference [2] involved milking the “Nucleotide”, “Transcriptome shotgun analysis” (TSA) and “Ref_seq genomic” (RSG) databases. At that time all polar bear data was confined to the TSA database and was missed by the Ketchum team, and there was little black bear data in any of these databases. Consequently our best hits were polar bear (Ursus maritimus) or giant panda (Ailuropoda melanoleuca). We updated our results later when a whole polar bear genome was added to the RSG database [4]. As expected it was now the best hit, as it is in the same genus (Ursus) as the black bear, whereas giant panda is not.


Later still, we searched the “Expressed sequence tags” (EST) database, which has black bear data, and found it matched S26 better than any other species.[5] For sequences where there was no black bear data, the best other species hits were outmatched by polar bear data from the RSG database. In only one out of 59 best hit sequences was a dog the best match. Other results showed the presence of human, dog, and bear DNA in S26 [6] and the fallacy of the Ketchum consultants’ methodology [7]. Other problems with the mtDNA were also addressed [8], including degradation [9].


The volume and consistency of these results should have convinced anyone of the bear origin of S26. However, some remain unconvinced. See Reference [10] for Melba Ketchum’s reasons for remaining so. Hence, we search yet another database here, the “RNA ref_seq) database. The database contains RNA sequences, including polar bear but not black bear. RNA (ribonucleic acid) is the complement of DNA (deoxyribonucleic acid): bases U for A or A for T and C for G or G for C). It uses three-base codons for amino acids to manufacture proteins, the workhorses of the cell. Any biochemistry text will explain this in more detail. The take away point here is that these RNA sequences can be searched against DNA queries – the software makes the base conversions mentioned above. In this paper we search the previously untapped RNA ref_seq database with the entire 2,726,786 bp S26 nDNA sequence from [1]. Our results and conclusions corroborate our previous findings, but contrast sharply with the Ketchum et al. conclusion above.




 
COMPUTATIONAL METHODS
 
All searches were conducted with the BLAST™ software (http://blast.be-md.ncbi.nlm.nih.gov/Blast.cgi) of the NCBI, and hits were downloaded as Excel files as described previously [2].   Search parameters were default except that for whole S26 sequence searches maximum hits was set to 5000 and word size to 64.  This is the same search software used by Ketchum et al. [1].  Her claim that “He didn't use bioinformatics software which has to be done in order to evaluate the data.[10] is false.  The “BI” in NCBI means “biotechnology information” = "bioinformatics."  One does not need extra “bioinformatics” software to compare two numbers such as %IDs or scores. The hard part, which Ketchum et al. totally failed at, is generating relevant sequences to compare through appropriate searches.  The phylotrees (Figs. 1and 2) were constructed just as the Ketchum Supp. Figs. 5 and 6 (reproduced here as Figs. 3 and 4) were with the BLAST™ feature “Distance tree of results”, as discussed below.


Another Ketchum claim, “You can't use only statistics to evaluate the sequences nor by tearing it down into little sequences unless you have software/expertise to do so.” is a red herring. My procedures as described in [1], and used throughout, do not involve “tearing it down into little sequences”. That was her consultants’ approach, which was decscribed by an NCBI contact of ours as “makes no sense”. Only in [7] did we break the Ketchum nDNA sequences down to compare results to her consultants’. Even then I got better matches to a bear and a dog for S26 and S140, respectively, with their methodology. You just have to search in the right databases. Incidentally, statistics are widely used in genetics, including the use of the Poisson distribution of mutations, which is what we did in [8] to show that some of the Ketchum mtDNA sequences were outside the range of normal humans. They were not our “only” tool, and in fact were “only” used once in the case of the mtDNA sequences in [8].


Our computational methods are sound and used by geneticists everywhere in some form or another. We applied them across multiple databases and made conclusions based on known phylogeny and taxonomy. Ketchum et al. did not.

 



RESULTS AND DISCUSSION


The RNA ref_seq database was queried with the entire S26 sequence, 5000 max hit entries, word size = 64, limited to mammals. A list of 27,135 total hits (more than one per database entry) was downloaded and sorted by score, then %ID. The best 30 hits by score were culled and examined. Table 1 shows the best 15 hits (plain text). Ten were polar bear (the only bear in this database), three had no polar bear sequences in the same range, and two had shorter, but higher %ID polar bear matches (in bold italics). Over these latter five ranges searches of the Reference genomic sequences (refseq_genomic) database limited to bears produced the underlined hits in Table 1. In these cases, polar bear was the best match by score and %ID.


The RNA ref_seq database was again queried limited to polar bear. The best 15 hits are listed in Table 2 (plain text). Ten of these are the same as the polar bear hits in Table 1. The best non polar bear hits over the same hit ranges from the mammals list were added to Table 2 (bold italic). A query limited to human only was performed and the best human matches over the same hit ranges were added to Table 2 and underlined. Ten of these sequence ranges matched ranges for best hits in [1] or [5]. In every case the polar bear was the best match by score and %ID. Human matches were a distant third.


Phylotrees are comparisons to multiple species, based on pairwise comparisons to their many sequences, and as such offer much more proof of identity than any single match does. A phylotree (distance tree of results) was produced from BLAST™ results (mammal) as Fig. 1. It clearly shows S26 in a close phylogenetic relationship to carnivores, especially bears. Notice the very distant relationship to all primates, including human. See the NCBI taxonomy database for comparison [11]. The unopened leaves (not shown in Fig. 1) revealed many of the species seen previously based on the RSG database [7] and in the same relative positions. Fig. 2 shows the expanded carnivore leaves from Fig. 1. Notice, as expected, walrus, seal, dog, ferret, polar bear and panda from Table 2. Contrast these phylotrees with the Ketchum et al. conclusion in the INTRODUCTION, and with the meaningless Ketchum phylotrees (Supplemental Figures 5 and 6 in [1]) as discussed in [7]. Those phylotrees are reproduced here for comparison as Figs. 3 and 4, taken from [7] and based on [1]. As we commented before,[7] “Simply stated, chicken, fish, mouse, and human are too distantly related to each other to be, as a group, the most related species to the Ketchum Sample 26.” “Where’s everything else?”


For comparison to Fig. 2, Fig. 5 shows the currently accepted Ursus phylogeny compared to a few select other carnivores [12]. The correspondence of S26 in Fig. 2 to black bear in Fig. 5 is excellent and unequivocal. Other species are also in the same relative positions in both figures.




CONCLUSIONS


The polar bear is the best RNA ref_seq match to S26. There are no black bear data in this database. However, the phylotree – distance tree of results – shows S26 in exactly the position that would be occupied by a black bear, distinct from the polar bear. Also, because the sample was collected in California, we believe it is a black bear, the only extant bear there. This was the fifth database in GenBank which supports a bear conclusion unequivocally. Also, three independent laboratory DNA analyses indicated a black bear [3].


Again we see the need to do multiple structured searches, sometimes against multiple databases, and to download and sort the hits to unravel the identity of a sequence, especially in the light of conserved genes. Settling for a single search of a single database such as our “mammals only” search and only looking at the top score in Table 1 and not %ID would have led to a false conclusion that the Pacific walrus (Odobenus rosmarus divergens) was the best match. This was the mistake of Ketchum et al. when they only searched the Nucleotide database (which contained no polar bear data and little black bear data) and concluded a human match for this sample even though the %IDs averaged only 94-95%, as explained previously [2, 7]. A species match requires 99+%ID. As noted previously, “One cannot match what is not in the database.” [2]


The Ketchum et al. conclusion, “…the species (sasquatch) possesses a novel mosaic pattern of nuclear DNA comprising novel sequences that are related to primates interspersed with sequences that are closely homologous to humans.” IS WRONG. Sample 26 is from a black bear.


ACKNOWLEDGEMENT
 

The author received no financial or other material support for this work.


CONFLICT OF INTEREST



The author declares no conflicting interests


REFERENCES



[1] See Sasquatch Genome Project link at right.


[2] See Paper 1 links at right.


[3] See The “Tyler Huggins Report” under Pages at right and on this blog, November 26, 2014, “
Ketchum Sample 26, The Smeja Kill: Independent Lab Reports.”


[4] See on this blog, November 30, 2014, “
Table 1 Updated: The Ketchum Sample 26 nDNA.”


[5] See on this blog, May 22, 2015, “
New Black Bear Data Show Ketchum Sample 26 (the Smeja Kill) is a Bear.”


[6] See Paper 3 link at right.


[7] See on this blog, December 30, 2014, “
Melba Ketchum’s Experts and Their Mistakes: What’s in a Phylotree.” 


[8] See Paper 2 link at right.


[9] See on this blog, December 29, 2014, “
Melba Ketchum Shows Sample 140 Degradation in her YouTube Video.”


[10] See on this blog, September 1, 2014, “My Response to Melba Ketchum’s Facebook Post About Me.” 




[11] For taxonomy see:
http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=33554.



[12] Cronin, M. A. et al., “Molecular Phylogeny and SNP Variation of Polar Bears (Ursus maritimus), Brown Bears (U. arctos), and Black Bears (U. americanus) Derived from Genome Sequences, Journal of Heredity 2014:105(3), pp. 312–323.

 


Table 1.  Best 15 Mammal Hits



Accession [a]         %ID[b] LEN[c] MIS[d] GAP[e]  Start[f]     End[g]   Score[h]   Species

 
XM_004394587.1 no match, RNA NW_007929448.1
96.58
 
 
98.83
2136
 
 
2134
45
 
 
0
2
 
 
1
189026
 
 
189028
191136
 
 
191136
3515
 
 
3779
Pacific walrus polar bear polar bear
 
XM_008704386.1
 
97.86
 
1679
 
2
 
4
 
1655920
 
1657569
 
2870
 
polar bear
 
XM_008688345.1
 
98.9
 
1460
 
1
 
2
 
1761456
 
1762900
 
2593
 
polar bear
 
XM_008686832.1
 
98.6
 
1359
 
1
 
1
 
759948
 
761288
 
2388
 
polar bear
 
XM_008701063.1
 
99.05
 
1267
 
2
 
2
 
855387
 
856643
 
2265
 
polar bear
 
XM_004752061.1 no match, RNA NW_007907318.1
 
95.82
 
 
99.56
 
1388
 
 
1374
 
39
 
 
3
 
9
 
 
2
 
312149
 
 
312149
 
313520
 
 
313520
 
2224
 
 
2501
 
domestic ferret polar bear polar bear
 
XM_011235586.1
 
99.65
 
1149
 
4
 
0
 
2258573
 
2259721
 
2100
 
polar bear
 
XM_008688389.1
 
99.23
 
1174
 
4
 
1
 
1835440
 
1836608
 
2113
 
polar bear
 
XM_003780852.1
 
94.43
 
1364
 
48
 
9
 
1657662
 
1659004
 
2073
 
galago
XM_008704386.1
99.67
304
1
0
1657662
1657965
556
polar bear
NW_007907230.1
99.87
787
1
0
1658089
1658875
1448
polar bear*
NW_007907230.1
99.77
429
1
0
1657662
1658090
787
polar bear*
 
XM_008707444.1
 
98.96
 
1151
 
0
 
2
 
1508093
 
1509231
 
2049
 
polar bear
 
XM_008686832.1
 
99.64
 
1104
 
3
 
1
 
756646
 
757748
 
2015
 
polar bear
 
XM_004412889.1
 
97.3
 
1183
 
23
 
2
 
602055
 
603228
 
1999
 
Pacific walrus
XM_008686943.1
99.89
888
1
0
602341
603228
1635
polar bear**
NW_003218343.1
98.99
1183
3
2
602055
603228
2109
giant panda+
 
XM_008690191.1
 
99.81
 
1050
 
2
 
0
 
2257336
 
2258385
 
1929
 
polar bear
 
XM_008688318.1
 
100
 
1040
 
0
 
0
 
2586084
 
2587123
 
1921
 
polar bear
 
XM_011750120.1 no match, RNA NW_007907318.1
 
97.83
 
 
100
 
1108
 
 
1103
 
20
 
 
0
 
1
 
 
0
 
363058
 
 
      363059
 
364161
 
 
    364161
 
1910
 
 
2037
 
macaque polar bear polar bear

 Footnotes [a] - [h]: Same as in Table 2. 

 *  Combine
**  Shorter sequence, but better %ID match.
+   Polar bear match had long gap, probably a sequencing error.
galago is short-eared galago.
macaque is pig-tailed macaque



Table 2.  Best 15 Polar Bear Hits



Accession [a]      %ID[b] LEN [c] MIS[d] GAP[e]   Start[f]    End [g]  Score[h]    Species
 
XM_008704386.1
97.86
1679
2
4
1655920[i]
1657569
2870
polar bear
XM_011222467.1
96.45
1688
7
13
1655920
1657569
2736
giant panda
XM_011545195.1
92.27
1695
73
30
1655921
1657569
2351
human
 
XM_008688345.1
 
98.9
 
1460
 
1
 
2
 
1761456[i]
 
1762900
 
2593
 
polar bear
XM_002920733.1
97.67
1460
19
2
1761456
1762900
2494
giant panda
XM_011542584.1
92.46
1459
95
2
1761457
1762900
2071
human
 
XM_008686832.1
 
98.6
 
1359
 
1
 
1
 
759948[i]
 
761288
 
2388
 
polar bear
XM_004755948.1
98.08
1353
11
8
759948
761288
2340
domestic ferret
XM_011519835.1
94.76
1354
35
8
759948
761288
2074
human
 
XM_008701063.1
 
99.05
 
1267
 
2
 
2
 
855387[i]
 
856643
 
2265
 
polar bear
XM_006728292.1
97.71
1266
18
3
855387
856643
2167
Weddell seal
XM_011520245.1
95.03
1268
48
8
855386
856643
1978
human
 
XM_008688389.1
 
99.23
 
1174
 
4
 
1
 
1835440[i]
 
1836608
 
2113
 
polar bear
XM_002925708.2
98.64
1174
11
1
1835440
1836608
2074
giant panda
NM_001261833.1
93.87
1175
65
3
1835440
1836608
1764
human
 
XM_008690191.1
 
99.65
 
1149
 
4
 
0
 
2258573[i]
 
2259721
 
2100
 
polar bear
XM_004412631.1
98.96
1149
12
0
2258573
2259721
2056
Pacific walrus
XM_011542831.1
96.34
1147
42
0
2258575
2259721
1886
human
 
XM_008707444.1
 
98.96
 
1151
 
0
 
2
 
1508093[i]
 
1509231
 
2049
 
polar bear
XM_004795929.1
96.04
1161
16
11
1508093
1509231
1862
domestic ferret
XM_005274051.2
93.29
1147
62
6
1508092
1509231
1677
human
 
XM_008686832.1
 
99.64
 
1104
 
3
 
1
 
756646[i]
 
757748
 
2015
 
polar bear
XM_004406102.1
98.82
1104
12
1
756646
757748
1965
Pacific walrus
XM_005252729.2
96.74
1104
35
1
756646
757748
1838
human
 
XM_008690191.1
 
99.81
 
1050
 
2
 
0
 
2257336
 
2258385
 
1929
 
polar bear
XM_011235586.1
99.43
1050
6
0
2257336
2258385
1906
giant panda
XM_011542831.1
95.24
1050
50
0
2257336
2258385
1663
human
 
XM_008688318.1
 
100
 
1040
 
0
 
0
 
2586084
 
2587123
 
1921
 
polar bear
XM_002912684.2
99.42
1040
6
0
2586084
2587123
1888
giant panda
XM_006718826.2
96.06
1040
41
0
2586084
2587123
1694
human
 
XM_008704414.1
 
98.67
 
1054
 
0
 
3
 
1663527[j]
 
1664579
 
1857
 
polar bear
XM_002917304.2
97.91
1054
8
3
1663527
1664579
1812
giant panda

XM_005274336.2
94.53
1060
49
8
1663527
1664579
1628
human
 
XM_008690205.1
 
99.31
 
1012
 
4
 
1
 
2102443[j]
 
2103451
 
1827
 
polar bear
XM_011233674.1
98.62
1015
8
3
2102443
2103451
1792
giant panda
NM_001301044.1
94.22
1021
44
10
2102443
2103451
1544
human
 
XM_008711943.1
 
99.58
 
954
 
3
 
1
 
184464[j]
 
185417
 
1738
 
polar bear
XM_005633808.1
99.27
958
5
2
184460
185417
1729
dog
NM_001172705.1
98.12
956
15
3
184463
185417
1663
human
 
XM_008686832.1
 
98.95
 
953
 
1
 
4
 
761289
 
762232
 
1696
 
polar bear
XM_002917485.2
98.53
954
4
5
761289
762232
1676
giant panda
XM_011519835.1
95.61
957
22
14
761289
762232
1517
human
 
XM_008701063.1
 
97.41
 
1005
 
5
 
2
 
856642
 
857646
 
1694
 
polar bear
XM_004790381.1
97.2
1000
28
0
856642
857641
1692
domestic ferret
XM_011520245.1
94.8
1000
52
0
856642
857641
1559
human




[a] In NCBI Genbank..

[b] Percentage of matching base pairs (bp) over the sequence range.

[c] Sequence length, bp.

[d] Mismatches, bp.


[e] Number of gaps (not number of bp in gaps)

[f] Starting position of match in bp along S26 sequence.

[g] End position of match in bp along S26 sequence.

[h] Score, see NCBI BLAST™ Handbook.

[i] Same sequence range in Paper 1.

[j] Same sequence range in EST blog, May 23, 2015.






Fig. 1.  Phylotree of All Mammal Hits







Fig. 1.  Taken directly from BLAST(TM) Distance tree of results with common names added.   Distance to European rabbit on top was shortened to compact.  "Unknown" = S26.



Fig. 2.  Expanded Phylotree - Carnivores Only




Fig. 2.  Taken directly from BLAST(TM) Distance tree of results with common names added in red.



Fig. 3.  Ketchum Supp. Fig. 5 as Redrawn in [7]





Fig. 3.  Ketchum Supp. Fig. 5 [1] as redrawn in [7].  Unknown sample number was not stated in the Ketchum paper.  Numbers  are distances (fraction of mismatches) to the unknown, calculated from the original Ketchum Supp. Fig. 5.          
 




Fig. 4.  Ketchum Supp. Fig. 6 as Redrawn in [7]




Fig. 4.  Ketchum Supp. Fig. 6 in [1] as redrawn in [7].  Unknown sample number was not stated in the Ketchum paper.  Numbers  are distances (fraction of mismatches) to the unknown, calculated from the original Ketchum Supp. Fig. 6.          





Fig. 5.  Select Carnivore Phylotree [12]



Fig. 5.  Taken unchanged from Ref. [12]Not a complete phylotree of all known species.  Other carnivores were omitted, e.g. pinapeds (seals, walrus) and more bears.   Domestic ferret (Fig. 2) is a subspecies of Polecat (Fig. 3).  ABC = Admiralty, Baranof, and Chichagof Islands (in southeast Alaska) where brown bears have significant polar bear DNA due to past hybridization.