PREFACE
I’ve been asked, “Why are we still talking about this?” (the Ketchum paper). At the risk of being presumptuous, I assume this came from those who are convinced that the evidence proves the Ketchum conclusions wrong. In any case,
1. Dr. Melba Ketchum, DVM, continues to appear on sensationalist radio shows arguing for her study and its conclusions before predisposed, sympathetic, biased interviewers and largely uninformed audiences.
2. She dismisses any opposing scientific analysis, not by addressing the technical details, but by claiming her critics have neither the background, education, institutional association, or software tools to make valid judgements about her work, but she doesn’t consistently apply the same criteria to her own coauthors and consultants or herself.
3. Matching DNA sequences at NCBI is a running target, as new data is constantly being added. New data -->> New results -->> Maybe new conclusions (maybe not). Science is always in flux.
4. This has been a learning process for me. As I make new personal (not necessarily new, new) discoveries, I like to share them, which is the purpose of any blog. Others have found my work helpful.
5. Nobody, particularly from among Ketchum coauthors and consultants, has pointed out any inaccuracies in my work. I have tried unsuccessfully to engage these folks and hope eventually to hear from them. Some won’t even be identified by name. Science has no place for anonymity. If you can’t stand behind your work, why should anybody take it seriously? In this context I have to commend Dr. Ketchum for at least standing up for her work. I take it very seriously. The blog below took approximately 40 hr to complete and involved dozens of independent searches.
New Black Bear Data Show Ketchum Sample 26 (the Smeja Kill) is a Bear
by Haskell Hart
ABSTRACT
The nuclear DNA sequence of Sample 26 (the Smeja kill) of the Ketchum study, “Novel North American Hominins: Next Generation Sequencing of Three Whole Genomes and Associated Studies “, was compared to the “Expressed sequence tags” database of the National Center for Biotechnology Information (NCBI) with BLAST(TM) search software. Fifty-nine hits with %ID ≥ 99% and sequence length ≥200 bp were further compared to the whole genome polar bear sequence in the NCBI Reference genomic sequences database. Results showed a preference for a bear in 58 of 59 matches (one was a dog), with black bear and polar bear best matches about equal in number where both species could be compared. Once again, S26 proves to be a bear, most likely a black bear considering its California origin.
INTRODUCTION
Previous analysis [1] of the Ketchum Sample 26 (S26) - the Smeja kill - nDNA sequence [2] concluded that this sample was a bear, most likely a black bear (Ursus americanus). However, the paucity of black bear sequences in several NCBI databases searched prevented absolute species determination. Instead, giant panda (Ailuropoda melanoleuca) and polar bear (Ursus maritimus) sequences were nearly always the best hits compared to all other species. Later, after its whole genome was available for query in 2014, the polar bear alone had the best hits [3]. Since S26 was from California, it was natural to assume that the only extant bear, the black bear, was the origin. This work will compare the S26 sequence to expressed sequence tags (EST) in the EST database of NCBI, a previously untapped database.
An EST is a nuclear cDNA (complementary DNA) sequence which is produced by cloning the corresponding mRNA sequence obtained from extracts from important organs, e.g. eyes, heart, lung, kidney, liver, or muscle [4]. Each sequence is usually less than 1000 bp (typically a few hundred bp), which makes amplification and sequencing relatively easy compared to whole genome methods. These sequences are often highly conserved (as organs function similarly among different species), so picking the “best” hit from among many “good” hits is critical.
Fortunately, the EST database contains a large number (38,757) of mostly nonredundant black bear sequences. There are no other bear sequences in this database. By comparison, the popular “nucleotide” database, the only one considered by the Ketchum team, contains only 1663 black bear sequences and 216,511 giant panda and 100,160 polar bear sequences as of this date. Thus, it seemed like an unusual opportunity to search the EST database for black bear (and, of course, other) matches.
COMPUTATIONAL METHODS
The S26 nDNA sequence, 2,726,786 bp, was previously downloaded from the online Ketchum paper [1] and, as the query, was not broken down into smaller segments, as is commonly done by others, including the Ketchum consultants. The smaller the segments are, the more likely the match to multiple species, because information is lost.
All searches were against the NCBI “Expressed sequence tags” database with the BLAST(TM) software. Search parameters were the default ones except that the word size was increased to 64 when the full S26 sequence was queried to conserve CPU and memory. The maximum number of hits was increased to 5,000 so as not to miss any significant matches. All search results were downloaded as Excel .csv files and converted to .xlsx files for further sorting and screening.
RESULTS AND DISCUSSION
To start with, we limited the query to black bear matches, which produced a promising list of 1503 black bear matches. The list was sorted by score and %ID. Considering the volume of data and the degree of match, it was decided to consider only those hits with ≥ 200bp and ≥ 99%ID in all future species comparisons. Sorted by score and %ID, we used this list as a check against matches in the general search.
To consider all species matches, a general search was performed (maximum number of database hits 5000) with no species limitations. The list of 22,035 hits (multiple per database hit) was downloaded and sorted by %ID, then length. Culling this list by hand produced a list of all hits satisfying the above criteria. These 59 hits are seen in Table 1. Twenty-nine were black bear, 11 dog, eight human, six pig, one sheep, one deer mouse, one cow, and two that matched many species equally (including some plants).
For those hits which were not a black bear (30), searches over the same S26 sequence ranges were performed against the “Reference genomic sequences” database, limited to polar bear (the closest species to black bear in that database). These hits are listed in bold italics immediately below the corresponding EST hits in Table 1, and are included in Fig. 1.
The black bear hits were also compared to polar bear in the Reference genomic sequences database. These results are also listed, underlined, immediately below each black bear hit in Table 1. These polar bear hits were not included in Fig. 1.
To ensure that all EST hits (bear and nonbear) were the best over their various sequence ranges, the entire hit list was scrutinized in the vicinities of these sequence ranges. The hits generally stood out by %ID and no other competing matches were found.
Overall results can be summarized thusly:
1. 29 of 59 best hits by score and %ID were black bear.
2. Of the 30 non black bear hits, 27 were better matched over the same sequence range by polar bear (bold italics) than by the non bear EST hit. None of these hit sequence ranges had any black bear data in the database.
3. Two matches (15. And 40. In Table 1) were equal: between pig and polar bear and between many species and polar bear, respectively.
4. Only one best match (57. In Table 1) was not a bear but a dog. For confirmation the Reference genomic sequence database was queried limited to dog (third line for no. 58.). This match was also better than the polar bear.
5. Comparing polar bear (underlined) to black bear, 11 of these polar bear hits were better than the corresponding black bear hits, five were equal, and 13 black bear hits were better than polar bear, many of these comparisons were within one mismatch or gap.
A graphic summary of these results over the entire S26 sequence is shown in Fig. 1. The abscissa is the starting position of the hit sequence (“Start” in Table 1) along the 2.7M bp of the S26 sequence. The ordinate is %ID. It is readily seen that generally (with some exceptions), bear hits are concentrated above and non bears below by %ID.
CONCLUSIONS
Before arriving at conclusions, the following points must always be kept in mind:
1. The S26 query sequence represents about 0.1% of an entire mammalian genome, gap not a “whole genome” as proclaimed in the Ketchum title and text.
2. The EST database does not contain complete genomes, only relatively short sequences thought to be important. Only the “Reference genomic sequences” database contains complete genomes, but the number of species there is much less than in the other databases.
3. Not all species are equally represented in the EST database, or in any of the NCBI databases.
4. Each species may have somewhat different genes represented in the EST database, so that direct comparisons over the same query sequence range may not always be possible.
However, the results above overwhelmingly favor a bear as the origin of S26. Matches to black bear and polar bear where possible were nearly equal. Based on these data alone there was no preference for one bear species over the other. In other words, the genes represented by these sequences are highly conserved, so that a full species determination cannot be made on the basis of only one polar bear specimen in the Reference genomic sequences database and the limited black bear data in the EST database. Intraspecies variation (SNPs) is likely as great as interspecies variation for these two bear species (in the same Ursus genus) over the matching genes in the EST database.
Sample 26 is a bear, based on all EST and polar bear matches with %ID ≥ 99%, and length ≥ 200bp. Only one sequence of 59 matched a dog better, which could be contamination, since the sample was found with the help of a dog. Consistent with these results and considering the California provenance, S26 is very likely a black bear. Such was the conclusion of three independent laboratories based on their separate sequencing experiments [5].
ACKNOWLEDGEMENT
Thanks go to Dr. Ketchum for making her S26 sequence available online.
This work received no financial support or other material support.
CONFLICT OF INTEREST
The author declares no competing interests.
REFERENCES
[1] See “Paper 1” link on right.
[2] See “Sasquatch Genome Project” link on right.
[3] See on this blog, November 30, 2014, “Table 1 Updated: The Ketchum Sample 26 nDNA”.
[4] Parkinson J, Blaxter M., Expressed sequence tags: an overview, Methods Mol. Biol., 2009; 533:1-12. doi: 10.1007/978-1-60327-136-3_1.
[5] See on this blog, November 26, 2014, "Ketchum Sample 26, The Smeja Kill: Independent Lab Reports".
Table 1. S26 vs. EST Highest Scores, gt 99 %ID, gt 200 bp Length
ACCESSION
|
%ID[a]
|
LEN[b]
|
MIS[c]
|
GAPS[d]
|
Start[e]
|
End[f]
|
SCORE[g]
|
Species
|
|
1.
|
GW310394.1
|
100
|
292
|
0
|
0
|
80747
|
81038
|
540
|
black bear
|
NW_007927674.1
|
99.66
|
292
|
1
|
0
|
80747
|
81038
|
534
|
polar bear
|
|
2.
|
GW298753.1
|
99.61
|
259
|
1
|
0
|
162255
|
162513
|
473
|
black bear
|
NW_007930700.1
|
99.61
|
259
|
1
|
0
|
162255
|
162513
|
473
|
polar bear
|
|
3.
|
GW309384.1
|
100
|
682
|
0
|
0
|
184476
|
185157
|
1260
|
black bear
|
NW_007929447.1
|
100
|
682
|
0
|
0
|
184476
|
185157
|
1260
|
polar bear
|
|
4.
|
GW299178.1
|
100
|
310
|
0
|
0
|
213697
|
214006
|
573
|
black bear*
|
GW281787.1
|
99.52
|
629
|
3
|
0
|
213697
|
214325
|
1146
|
black bear*
|
|
NW_007929448.1
|
99.84
|
629
|
1
|
0
|
213697
|
214325
|
1157
|
polar bear
|
|
5.
|
GW308365.1
|
99.6
|
248
|
1
|
0
|
247118
|
247365
|
455
|
black bear
|
NW_007929448.1
|
100
|
248
|
0
|
0
|
247118
|
247365
|
459
|
polar bear
|
|
6.
|
GW291037.1
|
100
|
294
|
0
|
0
|
260962
|
261255
|
544
|
black bear
|
NW_007929448.1
|
99.66
|
294
|
1
|
0
|
260962
|
261255
|
538
|
polar bear
|
|
7.
|
GW292544.1
|
100
|
375
|
0
|
0
|
313146
|
313520
|
693
|
black bear
|
NW_007907318.1
|
99.73
|
375
|
1
|
0
|
313146
|
313520
|
688
|
polar bear
|
|
8.
|
AI573283.1
|
99.39
|
494
|
1
|
1
|
343463
|
343954
|
894
|
human
|
NW_007907318.1
|
100
|
492
|
0
|
0
|
343463
|
343954
|
909
|
polar bear
|
|
9.
|
BE749348.1
|
99.82
|
554
|
1
|
0
|
363248
|
363801
|
1018
|
cow
|
NW_007907318.1
|
100
|
554
|
0
|
0
|
363248
|
363801
|
1024
|
polar bear
|
|
10.
|
GW295806.1
|
99.42
|
346
|
2
|
0
|
380277
|
380622
|
628
|
black bear
|
NW_007907318.1
|
100
|
346
|
0
|
0
|
380277
|
380622
|
640
|
polar bear
|
|
11.
|
GW288445.1
|
99.26
|
269
|
1
|
1
|
389816
|
390083
|
484
|
black bear
|
NW_007907318.1
|
100
|
268
|
0
|
0
|
389816
|
390083
|
496
|
polar bear
|
|
12.
|
GW306627.1
|
100
|
253
|
0
|
0
|
495501
|
495753
|
468
|
black bear
|
NW_007907078.1
|
100
|
253
|
0
|
0
|
495501
|
495753
|
468
|
polar bear
|
|
13.
|
GW298500.1
|
99.34
|
304
|
2
|
0
|
542051
|
542354
|
551
|
black bear
|
NW_007907078.1
|
100
|
304
|
0
|
0
|
542051
|
542354
|
562
|
polar bear
|
|
14.
|
BF974021.1
|
99.24
|
264
|
2
|
0
|
673229
|
673492
|
477
|
human
|
NW_007907078.1
|
99.62
|
265
|
0
|
1
|
673229
|
673492
|
483
|
polar bear
|
|
15.
|
HX217873.1
|
100
|
223
|
0
|
0
|
699417
|
699639
|
412
|
pig
|
NW_007907078.1
|
100
|
223
|
0
|
0
|
699417
|
699639
|
412
|
polar bear
|
|
16.
|
HX217873.1
|
99.18
|
367
|
1
|
2
|
700252
|
700618
|
660
|
pig
|
NW_007907078.1
|
100
|
367
|
0
|
0
|
700252
|
700618
|
678
|
polar bear
|
|
17.
|
GW290970.1
|
100
|
264
|
0
|
0
|
749867
|
750130
|
488
|
black bear
|
NW_007907078.1
|
98.86
|
264
|
3
|
0
|
749867
|
750130
|
472
|
polar bear
|
|
18.
|
CO686642.1
|
99.7
|
336
|
0
|
1
|
754448
|
754783
|
614
|
dog
|
NW_007907078.1
|
100
|
336
|
0
|
0
|
754448
|
754783
|
621
|
polar bear
|
|
19.
|
DN866489.1
|
99.75
|
403
|
1
|
0
|
778998
|
779400
|
739
|
dog
|
NW_007907078.1
|
100
|
403
|
0
|
0
|
778998
|
779400
|
745
|
polar bear
|
|
20.
|
DN431979.1
|
99.71
|
341
|
0
|
1
|
779637
|
779977
|
623
|
many
|
NW_007907078.1
|
99.71
|
341
|
1
|
0
|
779637
|
779977
|
625
|
polar bear
|
|
21.
|
GW292374.1
|
99.7
|
329
|
0
|
1
|
780284
|
780612
|
601
|
black bear
|
NW_007907078.1
|
99.09
|
329
|
2
|
1
|
780284
|
780612
|
590
|
polar bear
|
|
22.
|
DT539446.1
|
99.68
|
313
|
0
|
1
|
1041919
|
1042231
|
571
|
dog
|
NW_007907185.1
|
100
|
313
|
0
|
0
|
1041919
|
1042231
|
579
|
polar bear
|
|
23.
|
AA830903.1
|
99.15
|
353
|
3
|
0
|
1043067
|
1043419
|
636
|
human
|
NW_007907185.1
|
100
|
353
|
0
|
0
|
1043067
|
1043419
|
652
|
polar bear
|
|
24.
|
DN379914.1
|
99.16
|
356
|
3
|
0
|
1050478
|
1050833
|
641
|
dog
|
NW_007907185.1
|
99.72
|
356
|
1
|
0
|
1050478
|
1050833
|
652
|
polar bear
|
|
25.
|
CF411400.1
|
99.56
|
459
|
2
|
0
|
1056408
|
1056866
|
837
|
dog
|
NW_007907185.1
|
100
|
459
|
0
|
0
|
1056408
|
1056866
|
848
|
polar bear
|
|
26.
|
DN442006.1
|
99.58
|
236
|
1
|
0
|
1057402
|
1057637
|
431
|
dog
|
NW_007907185.1
|
100
|
236
|
0
|
0
|
1057402
|
1057637
|
436
|
polar bear
|
|
27.
|
GW279426.1
|
99.32
|
443
|
2
|
1
|
1261885
|
1262326
|
800
|
black bear
|
NW_007907319.1
|
94.32
|
370
|
20
|
1
|
1261885
|
1262253
|
566
|
polar bear
|
|
28.
|
GW291314.1
|
99.79
|
473
|
1
|
0
|
1311093
|
1311565
|
870
|
black bear
|
NW_007907319.1
|
97.23
|
470
|
0
|
1
|
1311093
|
1311562
|
784
|
polar bear
|
|
29.
|
FS641411.1
|
99.11
|
224
|
2
|
0
|
1320670
|
1320893
|
403
|
pig
|
NW_007907319.1
|
100
|
224
|
0
|
0
|
1320670
|
1320893
|
414
|
polar bear
|
|
30.
|
EE822031.1
|
99.57
|
234
|
0
|
1
|
1321973
|
1322206
|
425
|
sheep
|
NW_007907319.1
|
100
|
234
|
0
|
0
|
1321973
|
1322206
|
433
|
polar bear
|
|
31.
|
DT537638.1
|
99.63
|
541
|
1
|
1
|
1323769
|
1324309
|
987
|
dog
|
NW_007907319.1
|
100
|
283
|
0
|
0
|
1323769
|
1324051
|
523
|
polar bear#
|
|
NW_007907319.1
|
100
|
86
|
0
|
0
|
1324224
|
1324309
|
159
|
polar bear#
|
|
32.
|
EW079092.2
|
99.01
|
303
|
2
|
1
|
1325115
|
1325416
|
542
|
pig
|
NW_007907319.1
|
100
|
302
|
0
|
0
|
1325115
|
1325416
|
558
|
polar bear
|
|
33.
|
GW287019.1
|
99.63
|
269
|
1
|
0
|
1326203
|
1326471
|
492
|
black bear
|
NW_007907319.1
|
99.26
|
269
|
2
|
0
|
1326203
|
1326471
|
486
|
polar bear
|
|
34.
|
DR423679.1
|
99
|
399
|
4
|
0
|
1341810
|
1342208
|
715
|
human
|
NW_007907319.1|
|
99.5
|
399
|
2
|
0
|
1341810
|
1342208
|
726
|
polar bear
|
|
35.
|
GW314371.1
|
99.4
|
332
|
0
|
1
|
1342901
|
1343230
|
601
|
black bear
|
NW_007907319.1
|
99.15
|
234
|
0
|
1
|
1342999
|
1343230
|
420
|
polar bear
|
|
36.
|
GW306862.1
|
99.01
|
404
|
4
|
0
|
1361914
|
1362317
|
730
|
black bear
|
NW_007907319.1
|
100
|
404
|
0
|
0
|
1361914
|
1362317
|
747
|
polar bear
|
|
37.
|
GW278008.1
|
99.74
|
390
|
1
|
0
|
1370591
|
1370980
|
717
|
black bear
|
NW_007907319.1
|
99.74
|
390
|
1
|
0
|
1370591
|
1370980
|
715
|
polar bear
|
|
38.
|
GW296089.1
|
100
|
310
|
0
|
0
|
1387158
|
1387467
|
573
|
black bear
|
NW_007907177.1
|
100
|
310
|
0
|
0
|
1387158
|
1387467
|
573
|
polar bear
|
|
39.
|
DN273193.1
|
99.32
|
294
|
2
|
0
|
1493043
|
1493336
|
532
|
dog
|
NW_007907285.1
|
100
|
294
|
0
|
0
|
1493043
|
1493336
|
544
|
polar bear
|
|
40.
|
CX057394.1
|
100
|
301
|
0
|
0
|
1513922
|
1514222
|
556
|
many
|
NW_007907142.1
|
100
|
300
|
0
|
0
|
1513923
|
1514222
|
555
|
polar bear
|
|
41.
|
GH523317.1
|
99.39
|
488
|
3
|
0
|
1616909
|
1617396
|
885
|
deer mouse
|
NW_007907229.1
|
100
|
488
|
0
|
0
|
1616909
|
1617396
|
902
|
polar bear
|
|
42.
|
AU131045.1
|
100
|
392
|
0
|
0
|
1617005
|
1617396
|
725
|
human
|
NW_007907229.1
|
100
|
392
|
0
|
0
|
1617005
|
1617396
|
725
|
polar bear
|
|
43.
|
CF411212.1
|
99
|
300
|
3
|
0
|
1620530
|
1620829
|
538
|
dog
|
NW_007907229.1
|
99.67
|
300
|
0
|
1
|
1620530
|
1620829
|
547
|
polar bear
|
|
44.
|
DA189290.1
|
99.22
|
255
|
2
|
0
|
1647778
|
1648032
|
460
|
human
|
NW_007907229.1
|
100
|
255
|
0
|
0
|
1647778
|
1648032
|
472
|
polar bear
|
|
45.
|
DN749512.1
|
99.36
|
311
|
1
|
1
|
1663216
|
1663526
|
562
|
dog
|
NW_007907230.1
|
100
|
311
|
0
|
0
|
1663216
|
1663526
|
575
|
polar bear
|
|
46.
|
EW538766.2
|
99.36
|
311
|
1
|
1
|
1663527
|
1663836
|
562
|
pig
|
NW_007907230.1
|
100
|
310
|
0
|
0
|
1663527
|
1663836
|
573
|
polar bear
|
|
47.
|
DB179765.1
|
99.22
|
387
|
3
|
0
|
1805288
|
1805674
|
699
|
human
|
NW_007907090.1
|
100
|
387
|
0
|
0
|
1805288
|
1805674
|
715
|
polar bear
|
|
48.
|
GW307456.1
|
99.63
|
268
|
1
|
0
|
1961310
|
1961577
|
490
|
black bear
|
NW_007907111.1
|
100
|
268
|
0
|
0
|
1961310
|
1961577
|
496
|
polar bear
|
|
49.
|
GW293348.1
|
99.66
|
298
|
1
|
0
|
1971119
|
1971416
|
545
|
black bear
|
NW_007907111.1
|
100
|
298
|
0
|
0
|
1971119
|
1971416
|
551
|
polar bear
|
|
50.
|
GW309342.1
|
100
|
262
|
0
|
0
|
1999882
|
2000143
|
484
|
black bear
|
NW_007907111.1
|
100
|
262
|
0
|
0
|
1999882
|
2000143
|
484
|
polar bear
|
|
51.
|
GW308592.1
|
99.78
|
446
|
1
|
0
|
2000502
|
2000947
|
819
|
black bear
|
NW_007907111.1
|
100
|
446
|
0
|
0
|
2000502
|
2000947
|
824
|
polar bear
|
|
52.
|
GW287313.1
|
99.09
|
331
|
0
|
1
|
2102443
|
2102770
|
592
|
black bear
|
NW_007907111.1
|
98.79
|
331
|
1
|
1
|
2102443
|
2102770
|
586
|
polar bear
|
|
53.
|
GW298993.1
|
99.73
|
371
|
1
|
0
|
2302305
|
2302675
|
680
|
black bear
|
|NW_007907111.1
|
100
|
371
|
0
|
0
|
2302305
|
2302675
|
686
|
polar bear
|
|
54.
|
DB197414.1
|
99.09
|
328
|
3
|
0
|
2430108
|
2430435
|
590
|
human
|
NW_007907090.1
|
100
|
327
|
0
|
0
|
2430108
|
2430434
|
604
|
polar bear
|
|
55.
|
GW304781.1
|
99.65
|
288
|
1
|
0
|
2478714
|
2479001
|
527
|
black bear
|
NW_007907090.1
|
100
|
288
|
0
|
0
|
2478714
|
2479001
|
532
|
polar bear
|
|
56.
|
GW292148.1
|
100
|
247
|
0
|
0
|
2550774
|
2551020
|
457
|
black bear
|
No Match
|
polar bear
|
||||||||
57.
|
DN379546.1
|
99.31
|
291
|
2
|
0
|
2560046
|
2560336
|
527
|
dog
|
NW_007907090.1
|
98.61
|
287
|
4
|
0
|
2560046
|
2560332
|
508
|
polar bear
|
|
NC_006587.3
|
98.97
|
291
|
1
|
1
|
2560046
|
2560336
|
520
|
dog+
|
|
58.
|
BW962463.1
|
99.62
|
260
|
0
|
1
|
2585217
|
2585476
|
473
|
pig
|
NW_007907090.1
|
100
|
260
|
0
|
0
|
2585217
|
2585476
|
481
|
polar bear
|
|
59.
|
GW306071.1
|
100
|
351
|
0
|
0
|
2654215
|
2654565
|
649
|
black bear
|
No Match
|
polar bear
|
[a] Percentage of matching base pairs (bp) over the sequence range Start to End.
[b] Length of matching sequence, bp.
[c] Mismatches, bp.
[d] Gaps-either query or target, number, not bp.
[e] Hit starting position on S26 sequence, bp.
[f] Hit end position on S26 sequence, bp.
[g] Score, see NCBI BLAST(TM) Handbook.
* Overlapped.
# Combined.
+ See Text
No comments:
Post a Comment