ABSTRACT
The Ketchum Sample 26 nDNA sequence (2,726,786 bp long) was searched in total against the RNA reference genomic sequence (RNA ref_seq) database of the National Center for Biotechnology Information (NCBI), commonly known as GenBank. The matches to polar bear were far better than any other species. Primates, including human, were far poorer matches than the next best non bear matches. A phylotree showed Sample 26 in precisely the position that a black bear would occupy. Since Sample 26 is from California and since there are no black bear data in this database, the polar bear is the surrogate match from the same genus. Sample 26 is a black bear.
INTRODUCTION
The conclusion of the Ketchum paper, “Novel North American Hominins…..” [1], unsupported by any published sequence comparisons, was that:
“…the species (sasquatch) possesses a novel mosaic pattern of nuclear DNA comprising novel sequences that are related to primates interspersed with sequences that are closely homologous to humans.”
However, we showed previously, with multiple sequence comparisons, that Sample 26 (S26) – the Smeja kill – is a bear, most likely a black bear (Ursus americanus)[2], in agreement with three separate laboratory analyses [3], and that Sample 140 is a dog [2]. Only Sample 31 matched a human best [2]. Primates, including humans, were not even close to matching Samples 26 and 140. Reference [2] involved milking the “Nucleotide”, “Transcriptome shotgun analysis” (TSA) and “Ref_seq genomic” (RSG) databases. At that time all polar bear data was confined to the TSA database and was missed by the Ketchum team, and there was little black bear data in any of these databases. Consequently our best hits were polar bear (Ursus maritimus) or giant panda (Ailuropoda melanoleuca). We updated our results later when a whole polar bear genome was added to the RSG database [4]. As expected it was now the best hit, as it is in the same genus (Ursus) as the black bear, whereas giant panda is not.
Later still, we searched the “Expressed sequence tags” (EST) database, which has black bear data, and found it matched S26 better than any other species.[5] For sequences where there was no black bear data, the best other species hits were outmatched by polar bear data from the RSG database. In only one out of 59 best hit sequences was a dog the best match. Other results showed the presence of human, dog, and bear DNA in S26 [6] and the fallacy of the Ketchum consultants’ methodology [7]. Other problems with the mtDNA were also addressed [8], including degradation [9].
The volume and consistency of these results should have convinced anyone of the bear origin of S26. However, some remain unconvinced. See Reference [10] for Melba Ketchum’s reasons for remaining so. Hence, we search yet another database here, the “RNA ref_seq) database. The database contains RNA sequences, including polar bear but not black bear. RNA (ribonucleic acid) is the complement of DNA (deoxyribonucleic acid): bases U for A or A for T and C for G or G for C). It uses three-base codons for amino acids to manufacture proteins, the workhorses of the cell. Any biochemistry text will explain this in more detail. The take away point here is that these RNA sequences can be searched against DNA queries – the software makes the base conversions mentioned above. In this paper we search the previously untapped RNA ref_seq database with the entire 2,726,786 bp S26 nDNA sequence from [1]. Our results and conclusions corroborate our previous findings, but contrast sharply with the Ketchum et al. conclusion above.
COMPUTATIONAL
METHODS
All searches were conducted with the BLAST™ software
(http://blast.be-md.ncbi.nlm.nih.gov/Blast.cgi)
of the NCBI, and hits were downloaded as Excel files as described previously [2].
Search parameters were default except that for whole S26 sequence searches maximum hits was set to 5000 and word
size to 64. This is the same search software used by Ketchum et al. [1]. Her claim that “He
didn't use bioinformatics software which has to be done in order to evaluate
the data.”[10] is false. The “BI” in NCBI means “biotechnology information” = "bioinformatics." One
does not need extra “bioinformatics”
software to compare two numbers such as %IDs or scores. The hard part, which
Ketchum et al. totally failed at, is
generating relevant sequences to
compare through appropriate searches. The phylotrees (Figs. 1and 2) were
constructed just as the Ketchum Supp. Figs. 5 and 6 (reproduced here as Figs. 3
and 4) were with the BLAST™ feature “Distance tree of results”, as discussed
below.
Another Ketchum claim, “You can't use only statistics to evaluate the sequences nor by tearing it down into little sequences unless you have software/expertise to do so.” is a red herring. My procedures as described in [1], and used throughout, do not involve “tearing it down into little sequences”. That was her consultants’ approach, which was decscribed by an NCBI contact of ours as “makes no sense”. Only in [7] did we break the Ketchum nDNA sequences down to compare results to her consultants’. Even then I got better matches to a bear and a dog for S26 and S140, respectively, with their methodology. You just have to search in the right databases. Incidentally, statistics are widely used in genetics, including the use of the Poisson distribution of mutations, which is what we did in [8] to show that some of the Ketchum mtDNA sequences were outside the range of normal humans. They were not our “only” tool, and in fact were “only” used once in the case of the mtDNA sequences in [8].
Our computational methods are sound and used by geneticists everywhere in some form or another. We applied them across multiple databases and made conclusions based on known phylogeny and taxonomy. Ketchum et al. did not.
RESULTS
AND DISCUSSION
The RNA ref_seq database was queried with the entire S26 sequence, 5000 max hit entries, word size = 64, limited to mammals. A list of 27,135 total hits (more than one per database entry) was downloaded and sorted by score, then %ID. The best 30 hits by score were culled and examined. Table 1 shows the best 15 hits (plain text). Ten were polar bear (the only bear in this database), three had no polar bear sequences in the same range, and two had shorter, but higher %ID polar bear matches (in bold italics). Over these latter five ranges searches of the Reference genomic sequences (refseq_genomic) database limited to bears produced the underlined hits in Table 1. In these cases, polar bear was the best match by score and %ID.
The RNA ref_seq database was again queried limited to polar bear. The best 15 hits are listed in Table 2 (plain text). Ten of these are the same as the polar bear hits in Table 1. The best non polar bear hits over the same hit ranges from the mammals list were added to Table 2 (bold italic). A query limited to human only was performed and the best human matches over the same hit ranges were added to Table 2 and underlined. Ten of these sequence ranges matched ranges for best hits in [1] or [5]. In every case the polar bear was the best match by score and %ID. Human matches were a distant third.
Phylotrees are comparisons to multiple species, based on pairwise comparisons to their many sequences, and as such offer much more proof of identity than any single match does. A phylotree (distance tree of results) was produced from BLAST™ results (mammal) as Fig. 1. It clearly shows S26 in a close phylogenetic relationship to carnivores, especially bears. Notice the very distant relationship to all primates, including human. See the NCBI taxonomy database for comparison [11]. The unopened leaves (not shown in Fig. 1) revealed many of the species seen previously based on the RSG database [7] and in the same relative positions. Fig. 2 shows the expanded carnivore leaves from Fig. 1. Notice, as expected, walrus, seal, dog, ferret, polar bear and panda from Table 2. Contrast these phylotrees with the Ketchum et al. conclusion in the INTRODUCTION, and with the meaningless Ketchum phylotrees (Supplemental Figures 5 and 6 in [1]) as discussed in [7]. Those phylotrees are reproduced here for comparison as Figs. 3 and 4, taken from [7] and based on [1]. As we commented before,[7] “Simply stated, chicken, fish, mouse, and human are too distantly related to each other to be, as a group, the most related species to the Ketchum Sample 26.” “Where’s everything else?”
For comparison to Fig. 2, Fig. 5 shows the currently accepted Ursus phylogeny compared to a few select other carnivores [12]. The correspondence of S26 in Fig. 2 to black bear in Fig. 5 is excellent and unequivocal. Other species are also in the same relative positions in both figures.
CONCLUSIONS
The polar bear is the best RNA ref_seq match to S26. There are no black bear data in this database. However, the phylotree – distance tree of results – shows S26 in exactly the position that would be occupied by a black bear, distinct from the polar bear. Also, because the sample was collected in California, we believe it is a black bear, the only extant bear there. This was the fifth database in GenBank which supports a bear conclusion unequivocally. Also, three independent laboratory DNA analyses indicated a black bear [3].
Again we see the need to do multiple structured searches, sometimes against multiple databases, and to download and sort the hits to unravel the identity of a sequence, especially in the light of conserved genes. Settling for a single search of a single database such as our “mammals only” search and only looking at the top score in Table 1 and not %ID would have led to a false conclusion that the Pacific walrus (Odobenus rosmarus divergens) was the best match. This was the mistake of Ketchum et al. when they only searched the Nucleotide database (which contained no polar bear data and little black bear data) and concluded a human match for this sample even though the %IDs averaged only 94-95%, as explained previously [2, 7]. A species match requires 99+%ID. As noted previously, “One cannot match what is not in the database.” [2]
The Ketchum et al. conclusion, “…the species (sasquatch) possesses a novel mosaic pattern of nuclear DNA comprising novel sequences that are related to primates interspersed with sequences that are closely homologous to humans.” IS WRONG. Sample 26 is from a black bear.
ACKNOWLEDGEMENT
The author received no financial or other material support for this work.
The author received no financial or other material support for this work.
CONFLICT
OF INTEREST
The author declares no conflicting interests
REFERENCES
[1] See Sasquatch Genome Project link at right.
[2] See Paper 1 links at right.
[3] See The “Tyler Huggins Report” under Pages at right and on this blog, November 26, 2014, “Ketchum Sample 26, The Smeja Kill: Independent Lab Reports.”
[4] See on this blog, November 30, 2014, “Table 1 Updated: The Ketchum Sample 26 nDNA.”
[5] See on this blog, May 22, 2015, “New Black Bear Data Show Ketchum Sample 26 (the Smeja Kill) is a Bear.”
[6] See Paper 3 link at right.
[7] See on this blog, December 30, 2014, “Melba Ketchum’s Experts and Their Mistakes: What’s in a Phylotree.”
[8] See Paper 2 link at right.
[9] See on this blog, December 29, 2014, “Melba Ketchum Shows Sample 140 Degradation in her YouTube Video.”
[10] See on this blog, September 1, 2014, “My Response to Melba Ketchum’s Facebook Post About Me.”
[11] For taxonomy see: http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=33554.
[12] Cronin, M. A. et al., “Molecular Phylogeny and SNP Variation of Polar Bears (Ursus maritimus), Brown Bears (U. arctos), and Black Bears (U. americanus) Derived from Genome Sequences, Journal of Heredity 2014:105(3), pp. 312–323.
Table 1. Best 15 Mammal Hits
Accession [a] %ID[b] LEN[c]
MIS[d] GAP[e] Start[f] End[g] Score[h] Species
XM_004394587.1 no
match, RNA NW_007929448.1
|
96.58
98.83
|
2136
2134
|
45
0
|
2
1
|
189026
189028
|
191136
191136
|
3515
3779
|
Pacific
walrus polar bear polar bear
|
XM_008704386.1
|
97.86
|
1679
|
2
|
4
|
1655920
|
1657569
|
2870
|
polar bear
|
XM_008688345.1
|
98.9
|
1460
|
1
|
2
|
1761456
|
1762900
|
2593
|
polar bear
|
XM_008686832.1
|
98.6
|
1359
|
1
|
1
|
759948
|
761288
|
2388
|
polar bear
|
XM_008701063.1
|
99.05
|
1267
|
2
|
2
|
855387
|
856643
|
2265
|
polar bear
|
XM_004752061.1 no
match, RNA NW_007907318.1
|
95.82
99.56
|
1388
1374
|
39
3
|
9
2
|
312149
312149
|
313520
313520
|
2224
2501
|
domestic ferret
polar bear polar bear
|
XM_011235586.1
|
99.65
|
1149
|
4
|
0
|
2258573
|
2259721
|
2100
|
polar bear
|
XM_008688389.1
|
99.23
|
1174
|
4
|
1
|
1835440
|
1836608
|
2113
|
polar bear
|
XM_003780852.1
|
94.43
|
1364
|
48
|
9
|
1657662
|
1659004
|
2073
|
galago
|
XM_008704386.1
|
99.67
|
304
|
1
|
0
|
1657662
|
1657965
|
556
|
polar bear
|
NW_007907230.1
|
99.87
|
787
|
1
|
0
|
1658089
|
1658875
|
1448
|
polar bear*
|
NW_007907230.1
|
99.77
|
429
|
1
|
0
|
1657662
|
1658090
|
787
|
polar bear*
|
XM_008707444.1
|
98.96
|
1151
|
0
|
2
|
1508093
|
1509231
|
2049
|
polar bear
|
XM_008686832.1
|
99.64
|
1104
|
3
|
1
|
756646
|
757748
|
2015
|
polar bear
|
XM_004412889.1
|
97.3
|
1183
|
23
|
2
|
602055
|
603228
|
1999
|
Pacific walrus
|
XM_008686943.1
|
99.89
|
888
|
1
|
0
|
602341
|
603228
|
1635
|
polar bear**
|
NW_003218343.1
|
98.99
|
1183
|
3
|
2
|
602055
|
603228
|
2109
|
giant panda+
|
XM_008690191.1
|
99.81
|
1050
|
2
|
0
|
2257336
|
2258385
|
1929
|
polar bear
|
XM_008688318.1
|
100
|
1040
|
0
|
0
|
2586084
|
2587123
|
1921
|
polar bear
|
XM_011750120.1 no
match, RNA NW_007907318.1
|
97.83
100
|
1108
1103
|
20
0
|
1
0
|
363058
|
364161
|
1910
2037
|
macaque polar bear
polar bear
|
Footnotes [a] - [h]: Same as in Table 2.
* Combine
** Shorter sequence, but better %ID match.
+ Polar bear match had long gap, probably a sequencing error.
galago is short-eared galago.
macaque is pig-tailed macaque
Table 2. Best 15 Polar Bear Hits
Accession [a] %ID[b] LEN
[c]
MIS[d] GAP[e] Start[f] End [g] Score[h] Species
XM_008704386.1
|
97.86
|
1679
|
2
|
4
|
1655920[i]
|
1657569
|
2870
|
polar bear
|
XM_011222467.1
|
96.45
|
1688
|
7
|
13
|
1655920
|
1657569
|
2736
|
giant panda
|
XM_011545195.1
|
92.27
|
1695
|
73
|
30
|
1655921
|
1657569
|
2351
|
human
|
XM_008688345.1
|
98.9
|
1460
|
1
|
2
|
1761456[i]
|
1762900
|
2593
|
polar bear
|
XM_002920733.1
|
97.67
|
1460
|
19
|
2
|
1761456
|
1762900
|
2494
|
giant panda
|
XM_011542584.1
|
92.46
|
1459
|
95
|
2
|
1761457
|
1762900
|
2071
|
human
|
XM_008686832.1
|
98.6
|
1359
|
1
|
1
|
759948[i]
|
761288
|
2388
|
polar bear
|
XM_004755948.1
|
98.08
|
1353
|
11
|
8
|
759948
|
761288
|
2340
|
domestic ferret
|
XM_011519835.1
|
94.76
|
1354
|
35
|
8
|
759948
|
761288
|
2074
|
human
|
XM_008701063.1
|
99.05
|
1267
|
2
|
2
|
855387[i]
|
856643
|
2265
|
polar bear
|
XM_006728292.1
|
97.71
|
1266
|
18
|
3
|
855387
|
856643
|
2167
|
Weddell seal
|
XM_011520245.1
|
95.03
|
1268
|
48
|
8
|
855386
|
856643
|
1978
|
human
|
XM_008688389.1
|
99.23
|
1174
|
4
|
1
|
1835440[i]
|
1836608
|
2113
|
polar bear
|
XM_002925708.2
|
98.64
|
1174
|
11
|
1
|
1835440
|
1836608
|
2074
|
giant panda
|
NM_001261833.1
|
93.87
|
1175
|
65
|
3
|
1835440
|
1836608
|
1764
|
human
|
XM_008690191.1
|
99.65
|
1149
|
4
|
0
|
2258573[i]
|
2259721
|
2100
|
polar bear
|
XM_004412631.1
|
98.96
|
1149
|
12
|
0
|
2258573
|
2259721
|
2056
|
Pacific walrus
|
XM_011542831.1
|
96.34
|
1147
|
42
|
0
|
2258575
|
2259721
|
1886
|
human
|
XM_008707444.1
|
98.96
|
1151
|
0
|
2
|
1508093[i]
|
1509231
|
2049
|
polar bear
|
XM_004795929.1
|
96.04
|
1161
|
16
|
11
|
1508093
|
1509231
|
1862
|
domestic ferret
|
XM_005274051.2
|
93.29
|
1147
|
62
|
6
|
1508092
|
1509231
|
1677
|
human
|
XM_008686832.1
|
99.64
|
1104
|
3
|
1
|
756646[i]
|
757748
|
2015
|
polar bear
|
XM_004406102.1
|
98.82
|
1104
|
12
|
1
|
756646
|
757748
|
1965
|
Pacific walrus
|
XM_005252729.2
|
96.74
|
1104
|
35
|
1
|
756646
|
757748
|
1838
|
human
|
XM_008690191.1
|
99.81
|
1050
|
2
|
0
|
2257336
|
2258385
|
1929
|
polar bear
|
XM_011235586.1
|
99.43
|
1050
|
6
|
0
|
2257336
|
2258385
|
1906
|
giant panda
|
XM_011542831.1
|
95.24
|
1050
|
50
|
0
|
2257336
|
2258385
|
1663
|
human
|
XM_008688318.1
|
100
|
1040
|
0
|
0
|
2586084
|
2587123
|
1921
|
polar bear
|
XM_002912684.2
|
99.42
|
1040
|
6
|
0
|
2586084
|
2587123
|
1888
|
giant panda
|
XM_006718826.2
|
96.06
|
1040
|
41
|
0
|
2586084
|
2587123
|
1694
|
human
|
XM_008704414.1
|
98.67
|
1054
|
0
|
3
|
1663527[j]
|
1664579
|
1857
|
polar bear
|
XM_002917304.2
|
97.91
|
1054
|
8
|
3
|
1663527
|
1664579
|
1812
|
giant panda
|
XM_005274336.2
|
94.53
|
1060
|
49
|
8
|
1663527
|
1664579
|
1628
|
human
|
XM_008690205.1
|
99.31
|
1012
|
4
|
1
|
2102443[j]
|
2103451
|
1827
|
polar bear
|
XM_011233674.1
|
98.62
|
1015
|
8
|
3
|
2102443
|
2103451
|
1792
|
giant panda
|
NM_001301044.1
|
94.22
|
1021
|
44
|
10
|
2102443
|
2103451
|
1544
|
human
|
XM_008711943.1
|
99.58
|
954
|
3
|
1
|
184464[j]
|
185417
|
1738
|
polar bear
|
XM_005633808.1
|
99.27
|
958
|
5
|
2
|
184460
|
185417
|
1729
|
dog
|
NM_001172705.1
|
98.12
|
956
|
15
|
3
|
184463
|
185417
|
1663
|
human
|
XM_008686832.1
|
98.95
|
953
|
1
|
4
|
761289
|
762232
|
1696
|
polar bear
|
XM_002917485.2
|
98.53
|
954
|
4
|
5
|
761289
|
762232
|
1676
|
giant panda
|
XM_011519835.1
|
95.61
|
957
|
22
|
14
|
761289
|
762232
|
1517
|
human
|
XM_008701063.1
|
97.41
|
1005
|
5
|
2
|
856642
|
857646
|
1694
|
polar bear
|
XM_004790381.1
|
97.2
|
1000
|
28
|
0
|
856642
|
857641
|
1692
|
domestic ferret
|
XM_011520245.1
|
94.8
|
1000
|
52
|
0
|
856642
|
857641
|
1559
|
human
|
[a] In NCBI Genbank..
[b] Percentage of matching base pairs (bp) over the sequence range.
[c] Sequence length, bp.
[d] Mismatches, bp.
[e] Number of gaps (not number of bp in gaps)
[f] Starting position of match in bp along S26 sequence.
[g] End position of match in bp along S26 sequence.
[h] Score, see NCBI BLAST™ Handbook.
[i] Same sequence range in Paper 1.
[j] Same sequence range in EST blog, May 23, 2015.
Fig. 3. Ketchum Supp. Fig. 5 as Redrawn in [7]
Fig. 3. Ketchum Supp. Fig. 5 [1] as redrawn in [7]. Unknown sample number was not stated in the Ketchum paper. Numbers are distances (fraction of mismatches) to the unknown, calculated from the original Ketchum Supp. Fig. 5.
Fig. 4. Ketchum Supp. Fig. 6 as Redrawn in [7]
Fig. 4. Ketchum Supp. Fig. 6 in [1] as redrawn in [7]. Unknown sample number was not stated in the Ketchum paper. Numbers are distances (fraction of mismatches) to the unknown, calculated from the original Ketchum Supp. Fig. 6.
Fig. 5. Select Carnivore Phylotree [12]
Fig. 5. Taken unchanged from Ref. [12]. Not a complete phylotree of all known species. Other carnivores were omitted, e.g. pinapeds (seals, walrus) and more bears. Domestic ferret (Fig. 2) is a subspecies of Polecat (Fig. 3). ABC = Admiralty, Baranof, and Chichagof Islands (in southeast Alaska) where brown bears have significant polar bear DNA due to past hybridization.
No comments:
Post a Comment