Pages

Paper 1, Part II


   Since the black bear is the only known extant bear in California, the S26 sequence was compared to the black bear, Ursus americanus, in the Nucleotide Database.  There are only six hits, and none coincide with the longer sequence ranges of the best matches by score in Table 1.  However, five of the six hits are perfect matches (Table 3).  Out of 232,065 total Ursidae entries there are 184,051 panda entries, 38,952 polar bear entries, and only 1583 black bear entries in the Nucleotide Database (as of June, 2013), explaining why the black bear did not make the hit list as an Ursidae in Table 1: its limited entries in the Nucleotide Database do not match all of the more extensive sequencing of the panda and the polar bear.  We also searched these six specific hit sequence ranges for other possible matches, and we found four other bears (polar bear-Ursus maritimus, Asiatic black bear-Selenarctos thibetanus, brown bear-Ursus arctos, and the giant panda- Ailuropoda melanoleuca) at the top of the hit lists.  Human was way down the hit lists (Table 3).

Table 3.  S26 vs. black bear (Ursus americanus)
S26 vs.
Match
% ID
LEN
MIS
GAP
Start
End
SCORE
black bear
DQ240386.1
100
291
0
0
542355
542645
538
Asiatic black bear
DQ093584.1
100
291
0
0
542355
542645
538
brown bear
AY011500.1
100
291
0
0
542355
542645
538
brown bear
AF002239.1
100
291
0
0
542355
542645
538
human(348th)
HM763820.1
94.08
287
17
0
542355
542641
436
black bear
DQ240717.1
100
173
0
0
814867
815039
320
giant panda
XM_002926498.1
97.11
173
5
0
814867
815039
292
giant panda
DQ240718.1
97.11
173
5
0
814867
815039
292
Pacific walrus
XM_004409327.1
94.80
173
9
0
814867
815039
270
human
Not among 33 best hits
black bear
DQ240717.1
100
97
0
0
815040
815136
180
giant panda
JN414914.1
100
97
0
0
815040
815136
180
giant panda
XM_002926498.1
100
97
0
0
815040
815136
180
giant panda
DQ240718.1
100
97
0
0
815040
815136
180
human
Not among 92 best hits
black bear
DQ914964.1
93.67
79
5
0
1696406
1696484
119
Pacific walrus
XR_186719.1
96.20
79
2
1
1696406
1696484
128
S. American sea lion
AB714146.1
94.94
79
4
0
1696406
1696484
124
S. American sea lion
AB714145.1
94.94
79
4
0
1696406
1696484
124
human(133rd)
XM_003846557.1
91.14
79
7
0
1696406
1696484
108
black bear
DQ240386.1
100
53
0
0
542302
542354
99
polar bear
GAJD01025142.1
100
53
0
0
542302
542354
99
Pacific walrus
XM_004408656.1
100
53
0
0
542302
542354
99
giant panda
GU931015.2
100
53
0
0
542302
542354
99
human
Not among 279 best hits
black bear
EU031728.1
100
36
0
0
1964424
1964459
67.6
No Other Matches
S26 1964424-1964459  vs. nucleotide

Table 3. The first entry in each group is the black bear match to S26. Following are the top three other hits in that sequence range. Finally, the best human match over the same sequence range with rank in parentheses. column headings same as in Table 1.


    It is worth noting in passing that all polar bear sequences are now in the Transcriptome Shotgun Assembly (TSA) database.  Therefore, conducting a routine Nucleotide Database search will miss these important matches, which demonstrates the need to search all relevant NCBI databases.

Sample 31

   We did a S31 preliminary search of the entire Nucleotide Database which yielded a long list of fungi and bacteria, so the search was narrowed. Next, we searched S31 against the Nucleotide Database separately limited to human, to other primates (OP), to Canis, and to all other (AO) species not previously included.  “No matches” were filled in by searching the Reference Genomic Sequence (RS) database, to produce the results in Table 4.  Human was the best match over 12 of 15 unique sequence ranges (fungi and bacteria excluded).  Human tied other primates three times.  Most of the matches to human are so good, often 100% identity, that the possibility of another extant hominin is not likely.  No evidence of the human-primate mosaic mentioned in the Ketchum conclusion (2), above, was found.  Specifically, that would have resulted in some sequence ranges matching human best and other different ranges matching other primates best.  The best match (by score) over a single range is a fungus, which may have grown on the bait food.  The second highest score was questionable with low %ID matches to the Nucleotide Database.  By searching that specific sequence range over the RS database, we found a better partial match (shorter sequence) to the pygmy chimpanzee.  When the complete nucleotide hit list is sorted by %ID, then LEN, then SCORE, the human hits surpass all other groups’ hits in all three parameters, especially over the 99-100% ID ranges. A summary of Table 4 is found in Table 2.  Human followed by other primates greatly out matches dog and all other in all four criteria.
 

Table 4.  S31 top 15 human hits and top 15 other primate hits by score.
S31 vs.
Match
%ID
LEN
MIS
GAP
Start
End
SCORE
human-N(1)
3J3F_5
95.21
313
13
2
292664
292975
494
OP-N(1): chimpanzee
AC194985.3
94.25
313
16
2
292664
292975
477
Canis-N(1): dog
2ZKR_0
95.21
313
13
2
292664
292975
494
AO-N: fungus
JQ689076.1
99.67
299
1
0
292677
292975
547
human-N, G+T(2)
AC126398.5
84.08
490
32
14
197482
197953
431
OP-N: chimpanzee
AC148930.3
80.41
490
52
14
197482
197953
333
OP-RS: pygmy chimpanzee
NW_003862227.1
99.43
176
0
1
197482
197657(a)
318
Dog-N,RS
No Match
AO-N: turquoise killifish
GAIB01046747.1
79.96
484
51
13
197490
197953
315
human-N(3)
NG_021375.1
100
230
0
0
319943
320172
425
OP-RS(2): chimpanzee
NC_006478.3
99.57
230
1
0
319943
320172
420
Dog-N,RS
No Match
AO-N:
No Match
human-N(4)
3J3F_5
100
218
0
0
323943
324160
403
OP-RS(3):  n. white-cheeked gibbon
NC_019821.1
100
218
0
0
323943
324160
403
Canis-N: dog
2ZKR_0
100
214
0
0
323943
324156
396
AO-N: ribbon worm(2)
HQ856869.1
88.89
216
19
5
323938
324151
261
human-N(5)
AL109624.11
100
216
0
0
126039
126254
399
OP-RS(4): gorilla
NC_018435.1
98.15
216
4
0
126039
126254
377
Dog-N,RS
No Match
AO-N:
No Match
human-N(6)
NG_017040.1
99.05
211
1
1
251897
252106
377
OP-N(9): chimpanzee
AC183808.3
97.20
214
5
1
251897
252109
361
Dog-N,RS
No Match
AO-N:
No Match
human-N(7)
AP001024.6
100
201
0
0
396912
397112
372
OP-RS(7): chimpanzee
NC_006478.3
99.50
201
1
0
396912
397112
366
Dog-N,RS
No Match
AO-N:
No Match
human-N(8)
AP001788.5
100
200
0
0
254454
254653
370
OP-N(5): chimpanzee
NW_003870591.1
100
200
0
0
254454
254653
370
Dog-N,RS
No Match
AO-N:
No Match
human-N(9)
AL133399.1
99.51
203
0
1
131309
131510
368
OP-RS(8): gorilla
NC_018435.1
99.01
203
1
1
131309
131510
363
Dog-N,RS
No Match
AO-N:
No Match
human-N(10)
AC104042.5
100
199
0
0
143355
143553
368
OP-N(11): chimpanzee
NC_006478.3
98.99
199
1
1
143355
143553
355
Dog-N,RS
No Match
AO-N:
No Match
human-N(11)
AP003097.2
100
199
0
0
326842
327040
368
OP-RS(6): chimpanzee
NC_006478.3
100
199
0
0
326842
327040
368
Dog-N,RS
No Match
AO: false killer whale
AP011079.1
80.79
151
24
4
326893
327040
113
human-N(12)
AP000763.5
100
198
0
0
276637
276834
366
OP-RS(9): chimpanzee
NC_006478.3
99.49
198
1
0
276637
276834
361
Dog-N,RS
No Match
AO-N:
No Match
human-N(13)
NM_002556.2
100
197
0
0
217275
217471
364
OP-N(13): pygmy chimpanzee
XM_003826362.1
98.48
197
3
0
217275
217471
348
Dog-N,RS
No Match
AO-N: rabbit
NM_001082233.1
87.31
197
21
2
217276
217471
222
human-N(13)
EF445021.1
100
197
0
0
414777
414973
364
OP-N(12): pygmy chimpanzee
NW_003870010.1
98.98
197
2
0
414777
414973
353
Dog-N,RS
No Match
AO-N: stealth virus
AF065698.1
84.26
108
17
0
414778
414885
106
human-N(15)
AL136126.34
99.01
203
0
1
129971
130171
363
OP-RS: orangutan
NC_012602.1
93.63
204
7
4
129971
130171
300
Dog-N,RS
No Match
AO-N:
No Match
human-N
AP003440.2
100
195
0
0
430331
430525
361
OP-N(14): rhesus monkey
AC211797.4
97.44
195
5
0
430331
430525
333
Dog-N,RS
No Match
AO-N:
No Match
human-N
AP003531.2
100
180
0
0
278035
278214
333
OP-N(15): gorilla
XM_004051797.1
99.44
180
1
0
278035
278214
327
Dog-N
XM_534014.3
92.86
112
7
1
278107
278218
161
AO-N: giant panda
XM_002915314.1
95.54
112
4
1
278107
278218
178


Table 4. (a) Shorter sequence, best match. Format and abbreviations same as Tables 1 and 5.


 Sample 140

   A preliminary S140 search of the Nucleotide Database revealed that human was the best match by highest total score.  However, the dog (Canis lupus familiaris) had higher maximum %ID, once again demonstrating the need for examining the downloaded list of individual hits.  Results of S140 searches against both the nucleotide and the human G+T databases are found in Table 4.  The top 15 Canis (dog) hits and the top 15 human hits by score are listed.  Over all 17 of these sequence ranges Canis Slupus familiaris (the domestic dog) is the best match, better than human, other primates, and all other species in the database.  Further, when sorted by %ID, then LEN, then SCORE, the complete Nucleotide Database hit list is dominated by Canis hits, especially over the 99-100% ID ranges.  A final effort was made to find better mammalian matches for S140 by separately limiting hits to the order Carnivora, the families, Canidae (dogs, wolves, coyotes, foxes), Felisidae (cats), Mustelidae (weasels), Ursidae (bears), Mephitidae (skunks), Gilires (rabbits and rodents), Sciurini (squirrels), the genus Vulpes (foxes), the species Mus musculus (the mouse), the northern raccoon  (Procyon lotor), and the opossum (Monodelphis domestica).  No matches were as good as Canis lupus familiaris.  Table 2, a summary of Table 5, shows that the dog out matches all other groups in all four criteria.
 

Table 5.  S140 top 15 Canis and top 15 human hits by score.
S140 vs.
Match
% ID
LEN
MIS
GAP
Start
End
SCORE
Canis(1): dog
XM_533992.4
97.81
1645
12
12
1249239
1250864
2817
human-N(3)
NM_015885.3
93.30
1642
86
12
1249239
1250861
2401
OP: Bolivian squirrel monkey
XM_003935094.1
92.94
1642
89
13
1249239
1250861
2364
AO: giant panda
XM_002925391.1
95.80
1642
45
12
1249239
1250861
2628
Canis(1): dog
XM_540535.3
98.61
1578
18
2
655259
656836
2789
human-N(1)
NR_047674.1
94.99
1578
78
1
655259
656836
2475
OP:  chimpanzee
XM_001155521.3
95.12
1578
75
2
655259
656836
2486
AO: Pacific walrus
XM_004404764.1
96.58
1578
50
3
655259
656836
2612
Canis(3): dog
XM_847190.2
99.46
1470
6
1
1016493
1017960
2669
human-N(2)
NM_006328.3
96.39
1470
51
2
1016493
1017960
2420
OP: Bolivian squirrel monkey
XM_003941220.1
96.53
1470
49
2
1016493
1017960
2431
AO: giant panda
XM_002927796.1
98.37
1470
22
1
1016493
1017960
2580
Canis(4): dog
XM_855390.2
97.72
1488
15
2
802547
804034
2542
human-N(4)
NR_024222.1
92.09
1492
91
7
802547
804034
2076
OP: gorilla
XM_004051066.1
92.16
1492
90
7
802547
804034
2082
AO: Pacific walrus
XM_004399514.1
95.77
1488
44
2
802547
804034
2381
Canis(5): dog
NC_006603.3
99.35
1237
7
1
382120
383356
2239
human-G+T(5)
NW_004078070.1
94.76
1240
60
4
382120
383356
1925
OP: orangutan
AC220940.4
95.00
1240
57
5
382120
383356
1941
AO: house mouse
AC124775.4
89.01
1247
109
20
382124
383354
1519
Canis(6): dog
XM_845613.2
98.67
1207
5
7
723260
724455
2130
human-N(6)
BC028235.1
93.37
1207
69
4
723260
724455
1775
OP: gorilla
XM_004050991.1
93.29
1207
70
4
723260
724455
1770
AO: cat
XM_003993197.1
95.69
1207
41
5
723260
724455
1932
Canis(7): dog
XM_540754.2
99.62
1041
0
2
759403
760439
1897
human-N(7)
AK000301.1
94.62
1040
52
2
759404
760439
1607
OP: chimpanzee
XM_508396.4
94.72
1041
51
2
759403
760439
1615
AO:giant panda
XM_002921620.1
97.60
1041
21
2
759403
760439
1781
Canis(8): dog
XM_540888.2
96.78
1150
5
10
939919
941036
1890
human-N(11)
AB384864.1
90.27
1151
80
9
939918
941036
1476
OP: Rhesus monkey
XM_001115424.2
90.69
1149
75
9
939920
941036
1500
AO: giant panda
XM_002916673.1
92.17
1150
58
9
939919
941036
1596
Canis(9): dog
XM_850778.2
98.67
981
3
4
2001821
2002791
1731
human-N(15)
AB528444.1
93.39
984
55
4
2001818
2002791
1448
OP: white-cheeked gibbon
XM_003253412.2
93.50
984
54
4
2001818
2002791
1454
AO: horse
XM_001505107.3
94.61
984
43
3
2001818
2002791
1515
Canis(10): dog
XM_533158.3
99.68
939
3
0
586655
587593
1718
human-N(8)
NM_001278163.1
96.06
939
37
0
586655
587593
1530
OP: gorilla
XM_004050910.1
96.38
939
34
0
586655
587593
1546
AO: Pacific walrus
XM_004406102.1
97.44
939
24
0
586655
587593
1602
Canis(11): dog
XM_540554.3
98.86
965
0
1
578923
579876
1711
human-N(12)
NM_001076786.1
94.21
968
39
3
578923
579876
1461
OP: white-cheeked gibbon
XM_003254379.2
94.32
968
38
4
578923
579876
1467
AO: Pacific walrus
XM_004406106.1
96.38
968
21
4
578923
579876
1581
Canis(12): dog
XM_843660.2
99.89
898
0
1
1295384
1296280
1652
human-N
NM_012193.3
94.22
899
51
1
1295383
1296280
1371
OP: pygmy chimpanzee
XM_003832981.1
94.22
899
51
1
1295383
1296280
1371
AO: Pacific walrus
XM_004414080.1
96.88
898
27
1
1295384
1296280
1502
Canis(13): dog
XM_860009.2
97.65
938
1
7
1034498
1035414
1591
human-N(10)
JQ710744.1
95.52
938
21
7
1034498
1035414
1480
OP: chimpanzee
XM_003951945.1
95.74
938
19
7
1034498
1035414
1491
AO: giant panda
XM_002921298.1
96.48
938
12
7
1034498
1035414
1530
Canis(14): dog
XM_861143.2
96.46
961
16
9
992249
993192
1570
human-G+T
NM_005507.2
92.00
963
56
12
992249
993192
1332
OP: chimpanzee
XM_003951935.1
92.21
963
54
12
992249
993192
1343
AO: Pacific walrus
XM_004393924.1
94.91
963
29
10
992249
993192
1489
Canis(15): dog
XM_003639786.1
96.67
932
12
6
154301
155232
1531
human-N(9)
NM_001172705.1
95.71
933
20
7
154301
155232
1483
OP: white cheeked gibbon
XM_003254938.2
95.82
934
17
9
154301
155232
1489
AO: giant panda
XM_002925273.1
96.24
932
17
5
154301
155232
1511
Canis: dog
NC_006587.3
99.53
853
4
0
1581864
1582716
1554
human-N(13)
AK131376.1
97.54
853
20
1
1581864
1582716
1458
OP: Rhesus monkey
AC202613.6
97.30
853
22
1
1581864
1582716
1447
AO: house mouse
AC183268.4
93.10
855
38
8
1581864
1582716
1232
Canis: dog
NC_006603.3
99.22
900
6
1
113521
114420
1622
human-N(14)
AC091053.11
95.90
903
30
4
113521
114422
1456
OP: pygmy chimpanzee
NW_003870482.1
96.34
901
27
5
113521
114420
1476
AO: house mouse
JN950559.1
91.58
903
70
5
113524
114420
1242

Table 5. Format and abbreviations same as Tables 1 and 4.



Whole Genome Searches and Graphic Displays

   In the previous sections we compared hits with the highest scores in each candidate group.  We also thought it important to compare all hits across the respective whole genomes at some level, ≥ 95% ID.  We accomplished this by searching each sequence against the most relevant genomes in the Reference Genomic Sequence Database (RS): panda, human and dog.  Panda is the only bear in this database.  After removing duplicates and overlaps, we limited the results to ≥200 bp for S26, ≥150 bp for S31 and ≥100 bp for S140, reflecting the different query sequence lengths.  Results are shown in Table 6 for each sample sequence.

 

Table 6.  Statistics of Reference Genomic Sequence Matches


RefSeq
M1j(a)
M2j(b)
Ave. %ID(c)
Nj(d)
% C(e)
Norm. %C(f)
Ave. %ID(c)
Nj(d)
 
vs. S26
<< %ID    95%,  Length    200 bp >>
<<<<<  Whole Genome >>>>>
Panda
1929
7117
98.3
3417
74.2
≡100
98.1
11865
Dog
1299
3684
97.2
2198
49.7
67.0
96.3
11472
Human
833
1711
96.3
696
40.7
54.9
96.0
78085
vs. S31
<< %ID    95%,  Length    100 bp >>
<<<<<  Whole Genome >>>>>
Human
891
4261
99.6
4308
86.4
≡100
97.5
163491
Dog
795
3623
99.1
110
3.3
3.9
98.6
406
Panda
742
3188
98.9
73
2.4
2.8
98.5
126
vs. S140
<< %ID    95%,  Length    150 bp >>
<<<<<  Whole Genome >>>>>
Dog
1902
8523
99.0
3887
81.3
≡100
98.3
34764
Panda
1084
3208
97.3
2198
48.0
59.1
96.3
4605
Human
779
1953
96.6
933
38.6
47.5
95.9
115702
 
Table 6.  (a)  First Moment, Equn. 1.3, score x %.  (b)  Second Moment, Equn 1.4, score x %2.  (c)  Equation 1.5.  (d)  Number of hits.  (e)  Equn. 1.6.  %s not additive because sequences are conserved.  (f)  Normalized to highest %C ≡ 100.
 

 
   Although average %ID can discriminate among groups, the numbers can be very tight (S31) or occasionally reversed (S31 Whole Genome).  In these cases M2j, Nj, and %C are more numerically discriminating. 

   Of all the numbers presented in Tables 1-6, only the reversal above: 97.5 average % ID (in red in Table 6)   for the S31 vs. the human reference genomic sequence hits seems too low and out of line with other S31 statistics.  We found that many low %ID hits in the 85% - 90%ID range had corresponding high %ID hits over the same sequence range.  Reasons for these sequence ranges being so hypervariable were not immediately obvious but might merit further investigation.  We do believe, however, that the 99.6 average %ID over 4308 hits with %ID ≥ 95% is a very strong indication that the sample is human.             

   These results are consistent with the previous highest score results in Tables 1, 4, and 5:  In Table 6 S26 best matches the panda, S31 human, and S140 dog, as measured by highest first and second moments (Equns. 1.3 and 1.4), average %ID, number of hits, and % coverage. 

   Although these summary statistics are convincing, we decided to display individual hits graphically to examine the coverage across the entire sample sequences, as seen in Figs. 1-6.  We added results of Nucleotide Database searches of other primates (OP) for perspective.  The same matching sequence length criteria and %ID minimum (95%) were used in Figs. 1-6 as in Table 6.  The dense horizontal lines of points in Figures 3, 4 and 5 contain most of the data points of these three Figures.  Clearly S26 matches panda; S31 matches human; and S140 matches dog best, as judged by highest concentrations of data points near 100% ID in Figs. 1, 3, and 5 and highest overall second moments in Figs. 2, 4, and 6, confirming the overall statistics in Table 6.  Furthermore, these best matches apply across the entire sample sequence, i.e. there is no significant “mosaicity” - the interspersion of hit sequences with equally high match statistics from different groups.   For example, there are relatively much fewer nonhuman data points (Table 6) in Fig. 3, and these are not concentrated in the 99-100%ID range.  

(See Figures in Separate Page)


  We observed well established phylogenic relationships in the overall distributions of the data points:  human closer to other primates, dog closer to panda, the same as was found in Tables 1, 4, and 5. The vertical banding seen in Figs. 1, 2, 5, and to a lesser degree 6, is likely due to a higher concentration of database sequences at well studied, important, conserved genes.


CONTINUED
 
 
 

No comments:

Post a Comment