Search This Blog

Friday, May 22, 2015

New Black Bear Data Show Ketchum Sample 26 (the Smeja Kill) is a Bear




PREFACE

I’ve been asked, “Why are we still talking about this?” (the Ketchum paper). At the risk of being presumptuous, I assume this came from those who are convinced that the evidence proves the Ketchum conclusions wrong. In any case,

1. Dr. Melba Ketchum, DVM, continues to appear on sensationalist radio shows arguing for her study and its conclusions before predisposed, sympathetic, biased interviewers and largely uninformed audiences.


2. She dismisses any opposing scientific analysis, not by addressing the technical details, but by claiming her critics have neither the background, education, institutional association, or software tools to make valid judgements about her work, but she doesn’t consistently apply the same criteria to her own coauthors and consultants or herself.

3. Matching DNA sequences at NCBI is a running target, as new data is constantly being added. New data -->> New results -->> Maybe new conclusions (maybe not). Science is always in flux.

4. This has been a learning process for me. As I make new personal (not necessarily new, new) discoveries, I like to share them, which is the purpose of any blog. Others have found my work helpful. 

5. Nobody, particularly from among Ketchum coauthors and consultants, has pointed out any inaccuracies in my work. I have tried unsuccessfully to engage these folks and hope eventually to hear from them. Some won’t even be identified by name. Science has no place for anonymity. If you can’t stand behind your work, why should anybody take it seriously? In this context I have to commend Dr. Ketchum for at least standing up for her work. I take it very seriously.  The blog below took approximately 40 hr to complete and involved dozens of independent searches. 




New Black Bear Data Show Ketchum Sample 26 (the Smeja Kill) is a Bear

by Haskell Hart




ABSTRACT


The nuclear DNA sequence of Sample 26 (the Smeja kill) of the Ketchum study, “Novel North American Hominins: Next Generation Sequencing of Three Whole Genomes and Associated Studies “, was compared to the “Expressed sequence tags” database of the National Center for Biotechnology Information (NCBI) with BLAST(TM) search software.  Fifty-nine hits with %ID ≥ 99% and sequence length ≥200 bp were further compared to the whole genome polar bear sequence in the NCBI Reference genomic sequences database. Results showed a preference for a bear in 58 of 59 matches (one was a dog), with black bear and polar bear best matches about equal in number where both species could be compared. Once again, S26 proves to be a bear, most likely a black bear considering its California origin.

INTRODUCTION



Previous analysis [1] of the Ketchum Sample 26 (S26) - the Smeja kill - nDNA sequence [2] concluded that this sample was a bear, most likely a black bear (Ursus americanus). However, the paucity of black bear sequences in several NCBI databases searched prevented absolute species determination. Instead, giant panda (Ailuropoda melanoleuca) and polar bear (Ursus maritimus) sequences were nearly always the best hits compared to all other species. Later, after its whole genome was available for query in 2014, the polar bear alone had the best hits [3]. Since S26 was from California, it was natural to assume that the only extant bear, the black bear, was the origin. This work will compare the S26 sequence to expressed sequence tags (EST) in the EST database of NCBI, a previously untapped database.

An EST is a nuclear cDNA (complementary DNA) sequence which is produced by cloning the corresponding mRNA sequence obtained from extracts from important organs, e.g. eyes, heart, lung, kidney, liver, or muscle [4]. Each sequence is usually less than 1000 bp (typically a few hundred bp), which makes amplification and sequencing relatively easy compared to whole genome methods. These sequences are often highly conserved (as organs function similarly among different species), so picking the “best” hit from among many “good” hits is critical.  

Fortunately, the EST database contains a large number (38,757) of mostly nonredundant black bear sequences. There are no other bear sequences in this database. By comparison, the popular “nucleotide” database, the only one considered by the Ketchum team, contains only 1663 black bear sequences and 216,511 giant panda and 100,160 polar bear sequences as of this date. Thus, it seemed like an unusual opportunity to search the EST database for black bear (and, of course, other) matches.
 

COMPUTATIONAL METHODS



The S26 nDNA sequence, 2,726,786 bp, was previously downloaded from the online Ketchum paper [1] and, as the query, was not broken down into smaller segments, as is commonly done by others, including the Ketchum consultants. The smaller the segments are, the more likely the match to multiple species, because information is lost.

All searches were against the NCBI “Expressed sequence tags” database with the BLAST(TM) software.  Search parameters were the default ones except that the word size was increased to 64 when the full S26 sequence was queried to conserve CPU and memory. The maximum number of hits was increased to 5,000 so as not to miss any significant matches. All search results were downloaded as Excel .csv files and converted to .xlsx files for further sorting and screening.
 

RESULTS AND DISCUSSION



To start with, we limited the query to black bear matches, which produced a promising list of 1503 black bear matches. The list was sorted by score and %ID. Considering the volume of data and the degree of match, it was decided to consider only those hits with ≥ 200bp and ≥ 99%ID in all future species comparisons. Sorted by score and %ID, we used this list as a check against matches in the general search.

To consider all species matches, a general search was performed (maximum number of database hits 5000) with no species limitations. The list of 22,035 hits (multiple per database hit) was downloaded and sorted by %ID, then length. Culling this list by hand produced a list of all hits satisfying the above criteria. These 59 hits are seen in Table 1. Twenty-nine were black bear, 11 dog, eight human, six pig, one sheep, one deer mouse, one cow, and two that matched many species equally (including some plants).

For those hits which were not a black bear (30), searches over the same S26 sequence ranges were performed against the “Reference genomic sequences” database, limited to polar bear (the closest species to black bear in that database). These hits are listed in bold italics immediately below the corresponding EST hits in Table 1, and are included in Fig. 1.

The black bear hits were also compared to polar bear in the Reference genomic sequences database. These results are also listed, underlined, immediately below each black bear hit in Table 1. These polar bear hits were not included in Fig. 1.

To ensure that all EST hits (bear and nonbear) were the best over their various sequence ranges, the entire hit list was scrutinized in the vicinities of these sequence ranges. The hits generally stood out by %ID and no other competing matches were found.

Overall results can be summarized thusly:

1. 29 of 59 best hits by score and %ID were black bear.

2. Of the 30 non black bear hits, 27 were better matched over the same sequence range by polar bear (bold italics) than by the non bear EST hit. None of these hit sequence ranges had any black bear data in the database.

3. Two matches (15. And 40. In Table 1) were equal: between pig and polar bear and between many species and polar bear, respectively.

4. Only one best match (57. In Table 1) was not a bear but a dog. For confirmation the Reference genomic sequence database was queried limited to dog (third line for no. 58.). This match was also better than the polar bear.

5. Comparing polar bear (underlined) to black bear, 11 of these polar bear hits were better than the corresponding black bear hits, five were equal, and 13 black bear hits were better than polar bear, many of these comparisons were within one mismatch or gap.

A graphic summary of these results over the entire S26 sequence is shown in Fig. 1. The abscissa is the starting position of the hit sequence (“Start” in Table 1) along the 2.7M bp of the S26 sequence. The ordinate is %ID. It is readily seen that generally (with some exceptions), bear hits are concentrated above and non bears below by %ID.


CONCLUSIONS



Before arriving at conclusions, the following points must always be kept in mind:

1. The S26 query sequence represents about 0.1% of an entire mammalian genome, gap not a “whole genome” as proclaimed in the Ketchum title and text.

2. The EST database does not contain complete genomes, only relatively short sequences thought to be important. Only the “Reference genomic sequences” database contains complete genomes, but the number of species there is much less than in the other databases.

3. Not all species are equally represented in the EST database, or in any of the NCBI databases. 

4. Each species may have somewhat different genes represented in the EST database, so that direct comparisons over the same query sequence range may not always be possible.

However, the results above overwhelmingly favor a bear as the origin of S26. Matches to black bear and polar bear where possible were nearly equal. Based on these data alone there was no preference for one bear species over the other. In other words, the genes represented by these sequences are highly conserved, so that a full species determination cannot be made on the basis of only one polar bear specimen in the Reference genomic sequences database and the limited black bear data in the EST database. Intraspecies variation (SNPs) is likely as great as interspecies variation for these two bear species (in the same Ursus genus) over the matching genes in the EST database.

Sample 26 is a bear, based on all EST and polar bear matches with %ID ≥ 99%, and length ≥ 200bp. Only one sequence of 59 matched a dog better, which could be contamination, since the sample was found with the help of a dog. Consistent with these results and considering the California provenance, S26 is very likely a black bear.  Such was the conclusion of three independent laboratories based on their separate sequencing experiments [5].

ACKNOWLEDGEMENT 



Thanks go to Dr. Ketchum for making her S26 sequence available online.

This work received no financial support or other material support.

CONFLICT OF INTEREST




The author declares no competing interests.

REFERENCES



[1] See “Paper 1” link on right.

[2] See “Sasquatch Genome Project” link on right.

[3] See on this blog, November 30, 2014, “Table 1 Updated: The Ketchum Sample 26 nDNA”.

[4] Parkinson J, Blaxter M., Expressed sequence tags: an overview, Methods Mol. Biol., 2009; 533:1-12. doi: 10.1007/978-1-60327-136-3_1.


[5]  See on this blog, November 26, 2014, "Ketchum Sample 26, The Smeja Kill: Independent Lab Reports".


Table 1.  S26 vs. EST Highest Scores, gt 99 %ID, gt 200 bp Length

ACCESSION
%ID[a]
LEN[b]
MIS[c]
GAPS[d]
Start[e]
End[f]
SCORE[g]
Species
1.
GW310394.1
100
292
0
0
80747
81038
540
black bear
NW_007927674.1
99.66
292
1
0
80747
81038
534
polar bear
2.
GW298753.1
99.61
259
1
0
162255
162513
473
black bear
NW_007930700.1
99.61
259
1
0
162255
162513
473
polar bear
3.
GW309384.1
100
682
0
0
184476
185157
1260
black bear
NW_007929447.1
100
682
0
0
184476
185157
1260
polar bear
4.
GW299178.1
100
310
0
0
213697
214006
573
black bear*
GW281787.1
99.52
629
3
0
213697
214325
1146
black bear*
NW_007929448.1
99.84
629
1
0
213697
214325
1157
polar bear
5.
GW308365.1
99.6
248
1
0
247118
247365
455
black bear
NW_007929448.1
100
248
0
0
247118
247365
459
polar bear
6.
GW291037.1
100
294
0
0
260962
261255
544
black bear
NW_007929448.1
99.66
294
1
0
260962
261255
538
polar bear
7.
GW292544.1
100
375
0
0
313146
313520
693
black bear
NW_007907318.1
99.73
375
1
0
313146
313520
688
polar bear
8.
AI573283.1
99.39
494
1
1
343463
343954
894
human
NW_007907318.1
100
492
0
0
343463
343954
909
polar bear
9.
BE749348.1
99.82
554
1
0
363248
363801
1018
cow
NW_007907318.1
100
554
0
0
363248
363801
1024
polar bear
10.
GW295806.1
99.42
346
2
0
380277
380622
628
black bear
NW_007907318.1
100
346
0
0
380277
380622
640
polar bear
11.
GW288445.1
99.26
269
1
1
389816
390083
484
black bear
NW_007907318.1
100
268
0
0
389816
390083
496
polar bear
12.
GW306627.1
100
253
0
0
495501
495753
468
black bear
NW_007907078.1
100
253
0
0
495501
495753
468
polar bear
13.
GW298500.1
99.34
304
2
0
542051
542354
551
black bear
NW_007907078.1
100
304
0
0
542051
542354
562
polar bear
14.
BF974021.1
99.24
264
2
0
673229
673492
477
human
NW_007907078.1
99.62
265
0
1
673229
673492
483
polar bear
15.
HX217873.1
100
223
0
0
699417
699639
412
pig
NW_007907078.1
100
223
0
0
699417
699639
412
polar bear
16.
HX217873.1
99.18
367
1
2
700252
700618
660
pig
NW_007907078.1
100
367
0
0
700252
700618
678
polar bear
17.
GW290970.1
100
264
0
0
749867
750130
488
black bear
NW_007907078.1
98.86
264
3
0
749867
750130
472
polar bear
18.
CO686642.1
99.7
336
0
1
754448
754783
614
dog
NW_007907078.1
100
336
0
0
754448
754783
621
polar bear
19.
DN866489.1
99.75
403
1
0
778998
779400
739
dog
NW_007907078.1
100
403
0
0
778998
779400
745
polar bear
20.
DN431979.1
99.71
341
0
1
779637
779977
623
many
NW_007907078.1
99.71
341
1
0
779637
779977
625
polar bear
21.
GW292374.1
99.7
329
0
1
780284
780612
601
black bear
NW_007907078.1
99.09
329
2
1
780284
780612
590
polar bear
22.
DT539446.1
99.68
313
0
1
1041919
1042231
571
dog
NW_007907185.1
100
313
0
0
1041919
1042231
579
polar bear
23.
AA830903.1
99.15
353
3
0
1043067
1043419
636
human
NW_007907185.1
100
353
0
0
1043067
1043419
652
polar bear
24.
DN379914.1
99.16
356
3
0
1050478
1050833
641
dog
NW_007907185.1
99.72
356
1
0
1050478
1050833
652
polar bear
25.
CF411400.1
99.56
459
2
0
1056408
1056866
837
dog
NW_007907185.1
100
459
0
0
1056408
1056866
848
polar bear
26.
DN442006.1
99.58
236
1
0
1057402
1057637
431
dog
NW_007907185.1
100
236
0
0
1057402
1057637
436
polar bear
27.
GW279426.1
99.32
443
2
1
1261885
1262326
800
black bear
NW_007907319.1
94.32
370
20
1
1261885
1262253
566
polar bear
28.
GW291314.1
99.79
473
1
0
1311093
1311565
870
black bear
NW_007907319.1
97.23
470
0
1
1311093
1311562
784
polar bear
29.
FS641411.1
99.11
224
2
0
1320670
1320893
403
pig
NW_007907319.1
100
224
0
0
1320670
1320893
414
polar bear
30.
EE822031.1
99.57
234
0
1
1321973
1322206
425
sheep
NW_007907319.1
100
234
0
0
1321973
1322206
433
polar bear
31.
DT537638.1
99.63
541
1
1
1323769
1324309
987
dog
NW_007907319.1
100
283
0
0
1323769
1324051
523
polar bear#
NW_007907319.1
100
86
0
0
1324224
1324309
159
polar bear#
32.
EW079092.2
99.01
303
2
1
1325115
1325416
542
pig
NW_007907319.1
100
302
0
0
1325115
1325416
558
polar bear
33.
GW287019.1
99.63
269
1
0
1326203
1326471
492
black bear
NW_007907319.1
99.26
269
2
0
1326203
1326471
486
polar bear
34.
DR423679.1
99
399
4
0
1341810
1342208
715
human
NW_007907319.1|
99.5
399
2
0
1341810
1342208
726
polar bear
35.
GW314371.1
99.4
332
0
1
1342901
1343230
601
black bear
NW_007907319.1
99.15
234
0
1
1342999
1343230
420
polar bear
36.
GW306862.1
99.01
404
4
0
1361914
1362317
730
black bear
NW_007907319.1
100
404
0
0
1361914
1362317
747
polar bear
37.
GW278008.1
99.74
390
1
0
1370591
1370980
717
black bear
NW_007907319.1
99.74
390
1
0
1370591
1370980
715
polar bear
38.
GW296089.1
100
310
0
0
1387158
1387467
573
black bear
NW_007907177.1
100
310
0
0
1387158
1387467
573
polar bear
39.
DN273193.1
99.32
294
2
0
1493043
1493336
532
dog
NW_007907285.1
100
294
0
0
1493043
1493336
544
polar bear
40.
CX057394.1
100
301
0
0
1513922
1514222
556
many
NW_007907142.1
100
300
0
0
1513923
1514222
555
polar bear
41.
GH523317.1
99.39
488
3
0
1616909
1617396
885
deer mouse
NW_007907229.1
100
488
0
0
1616909
1617396
902
polar bear
42.
AU131045.1
100
392
0
0
1617005
1617396
725
human
NW_007907229.1
100
392
0
0
1617005
1617396
725
polar bear
43.
CF411212.1
99
300
3
0
1620530
1620829
538
dog
NW_007907229.1
99.67
300
0
1
1620530
1620829
547
polar bear
44.
DA189290.1
99.22
255
2
0
1647778
1648032
460
human
NW_007907229.1
100
255
0
0
1647778
1648032
472
polar bear
45.
DN749512.1
99.36
311
1
1
1663216
1663526
562
dog
NW_007907230.1
100
311
0
0
1663216
1663526
575
polar bear
46.
EW538766.2
99.36
311
1
1
1663527
1663836
562
pig
NW_007907230.1
100
310
0
0
1663527
1663836
573
polar bear
47.
DB179765.1
99.22
387
3
0
1805288
1805674
699
human
NW_007907090.1
100
387
0
0
1805288
1805674
715
polar bear
48.
GW307456.1
99.63
268
1
0
1961310
1961577
490
black bear
NW_007907111.1
100
268
0
0
1961310
1961577
496
polar bear
49.
GW293348.1
99.66
298
1
0
1971119
1971416
545
black bear
NW_007907111.1
100
298
0
0
1971119
1971416
551
polar bear
50.
GW309342.1
100
262
0
0
1999882
2000143
484
black bear
NW_007907111.1
100
262
0
0
1999882
2000143
484
polar bear
51.
GW308592.1
99.78
446
1
0
2000502
2000947
819
black bear
NW_007907111.1
100
446
0
0
2000502
2000947
824
polar bear
52.
GW287313.1
99.09
331
0
1
2102443
2102770
592
black bear
NW_007907111.1
98.79
331
1
1
2102443
2102770
586
polar bear
53.
GW298993.1
99.73
371
1
0
2302305
2302675
680
black bear
|NW_007907111.1
100
371
0
0
2302305
2302675
686
polar bear
54.
DB197414.1
99.09
328
3
0
2430108
2430435
590
human
NW_007907090.1
100
327
0
0
2430108
2430434
604
polar bear
55.
GW304781.1
99.65
288
1
0
2478714
2479001
527
black bear
NW_007907090.1
100
288
0
0
2478714
2479001
532
polar bear
56.
GW292148.1
100
247
0
0
2550774
2551020
457
black bear
No Match
polar bear
57.
DN379546.1
99.31
291
2
0
2560046
2560336
527
dog
NW_007907090.1
98.61
287
4
0
2560046
2560332
508
polar bear
NC_006587.3
98.97
291
1
1
2560046
2560336
520
dog+
58.
BW962463.1
99.62
260
0
1
2585217
2585476
473
pig
NW_007907090.1
100
260
0
0
2585217
2585476
481
polar bear
59.
GW306071.1
100
351
0
0
2654215
2654565
649
black bear
No Match
polar bear



[a]   Percentage of matching base pairs (bp) over the sequence range Start to End.
[b]  Length of matching sequence, bp.
[c]  Mismatches, bp.
[d]  Gaps-either query or target, number, not bp. 
[e]  Hit starting position on S26 sequence, bp.
[f]  Hit end position on S26 sequence, bp.
[g]  Score, see NCBI BLAST(TM) Handbook.

*     Overlapped.
#     Combined.
+     See Text