Search This Blog

Paper 3


Specific Gene DNA Sequences:
  The Ketchum et al.
Mistakes Revealed

By Haskell V. Hart        August 26, 2014

© 2014, Haskell V. Hart, all rights reserved.  May be reproduced for individual use only.

Abstract

Specific nuclear gene DNA sequences from the Ketchum et al. study of purported bigfoot/sasquatch samples are identified for species and compared to their results.  Serious errors in interpretation were found in the latter, which were tied to incomplete searches of all available DNA databases. Never the less, a few human-like samples might merit further investigation.  The overarching conclusion of this and two previous studies is that bigfoot/sasquatch was not proven to exist by Ketchum et al. 

Introduction

The Ketchum paper, “Novel North American Hominins: Next Generation Sequencing of Three Whole Genomes and Associated Studies,”[1] gives no details of BLAST™ search/match strategy or results, but claims “homology” to humans for various DNA sequences.  Previous work showed that two of the three nuDNA sequence interpretations were incorrect, with best matches to a bear and a dog,[2, 3,4] and that eight of 18 complete mtDNA sequences are statistical outliers with less than 1% probability of occurring in the human population.[5]  The more recent Sasquatch Genome Project (SGP) website, www.sasquatchgenomeproject.org, reveals more mistakes of omission, this time with enough results and narrative to clearly demonstrate where Ketchum et al. went wrong.  The present paper was written to explain these mistakes and show that had basic principles[3] of this kind of search/match been followed, publication of such a paper[1] with unsupported claims would have been avoided.

Computer Methods

We used the same search/match software that was used by Ketchum et al., BLAST™ [6,7], http://blast.ncbi.nlm.nih.gov/Blast.cgi, for matching (aligning) a sequence with the databases of known sequences, which are maintained by the National Center for Bioinformation (NCBI), http://www.ncbi.nlm.nih.gov/.  Sequences were taken from the SGP website, “Supplemental Raw Data” tab, and from “Supplementary Data 3” in the original paper[1],  also available on the SGP website “View DNA Study” tab.   The SGP sequences were cut and pasted into a text file, and a title row beginning with “>” was added to conform to FASTA format as required by BLAST™.  The Supplementary Data 3 sequences were already in FASTA format, ready to paste as BLAST™ input.  A variation, Primer-BLAST™, was used to search Amel X and Amel Y amplicons (including primers) against possible coincidental, nonhuman matches.  Online help adequately described the input requirements.    

All NCBI databases and BLAST™ software are available free to the public, making it possible for anyone to validate the results presented below.  They would not need to be a geneticist; they simply need to spend some time learning the nuances of the search software and the databases.. 

Results and Discussion

The “Supplemental Raw Data” tab on the SGP website opens up a dissertation by Dr. Ketchum on validating sequence electropherograms and matching specific gene sequences with species in the database, obviously only the “Nucleotide Collection” database, as our results will show.  Its purpose was “to put to rest any questions about our study and the data quality.”  Our issue is with the interpretation of some of these sequences, the identifications.   

Table 1 presents our search/match results for the Ketchum sequences in the website’s Supplemental Raw Data (top part) and in the Ketchum et al. paper[1] (bottom part).

 

Table 1.  Search/Match Results for Specific Gene Sequences

 

Gene
Search/Accession Match
%ID
Length
Score
ID
 
From SGP Website
 
MC1R
SGP S25-26 MC1R vs. Nucleotide
 
JN575070.1
86.02
794
852
black bear
 
AY884206.1
83.58
615
577
giant panda
 
AB598380.1
100
824
1522
human
 
********
********
*****
*****
*****
********
 
PNLIP
SGP S25-26 PNLIP vs. Nucleotide
 
NG_023311.1
100
215
398
human
 
AL731653.1
100
215
398
human
 
********
********
*****
*****
*****
********
 
M16(My16)
SGP S25-26 M16 vs. Nucleotide
 
AC005163.3
100
235
435
human
 
BK001410.1
100
235
435
human
 
********
********
*****
*****
*****
********
 
Amel X
SGP S25-26 AmelX vs Nucleotide
Unknown
 
SGP S25-26 AmelX vs RefSeq
 
NC_006585.3
99.46
186
337
dog
 
NW_003726054.1
99.46
186
337
dog
 
SGP S25-26 AmelX vs Genomes (chromosome)
 
NC_006585.3
99.46
186
337
dog
 
SGP S25-26 Amel X vs WGS
 
AOCS01003912.1
99.46
186
337
dog
 
AAEX03002445.1
99.46
186
337
dog
 
AACN010745680.1
99.46
186
337
dog
 
********
********
*****
*****
*****
********
 
Amel Y Exon2
SGP S25-26 Amel Y Ex2 vs Nucleotide
 
XM_004792766.1
83.06
484
412
domestic ferret /
European Polecat *
 
 
SGP S25-26 Amel Y Ex2 vs RefSeq
 
 
NW_007907173.1 (NEW August 4, 2014)
100
463
856
polar bear
 
NW_003219866.1
95.27
465
734
giant panda
 
SGP S25-26 Amel Y Ex2 vs WGS
 
 
AVOR01061180.1 (NEW August 12, 2014)
100
463
856
polar bear
 
ACTA01188137.1
95.27
465
734
giant panda
 
SGP S25-26 Amel Y Ex 2 vs Genomes
 
 
AVOR01061180.1 (NEW August 12, 2014)
100
464
856
polar bear
 
ACTA01188137.1
95.27
465
734
giant panda
 
SGP S25-26 Amel Y Ex2 vs TSA
 
 
AVOR01061180.1 (NEW August 12, 2014)
100
463
856
polar bear
 
 
ACTA01188137.1
95.27
465
734
giant panda
 
********
********
*****
*****
*****
********
 
HAR1
SGP S25-26 HAR1 vs Nucleotide
Unknown
 
 
SGP S25-26 HAR1 vs RefSeq
 
NW_007907114.1 (NEW August 4, 2014)
98.08
208
363
polar bear
 
NW_003219288.1
92.86
238
346
giant panda
 
SGP S25-26 HAR1 vs Genomes
 
 
NW_007907114.1 (NEW August 4, 2014)
98.08
208
363
polar bear
 
SGP S25-26 HAR1 vs TSA
Unknown
 
SGP S25-26 HAR1 vs WGS
 
 
AVOR01033236.1 (NEW August 12, 2014)
98.08
208
363
polar bear
 
ACTA01148086.1
92.86
238
346
giant panda
 
 
 
From Supplementary Data 3 of the Ketchum et al. paper
 
 
S26 & S35 TAP1
Sequence3 S26 TAP1 vs Nucleotide
 
AB528393.1
100
482
891
human
 
********
********
*****
*****
*****
********
 
S10 TAP1
Sequence5 S10 TAP1 vs Nucleotide
 
AC190393.6
95.77
189
307
dog
 
AC188661.8
95.77
189
307
dog
 
Sequence5 S10 TAP1 vs RefSeq
 
NC_006588.3
99.18
245
444
dog
 
NW_003726065.1
99.18
245
444
dog
 
********
********
*****
*****
*****
********
 
S33 TAP1
Sequence 6 S33 TAP1 vs Nucleotide
 
KF523403.1
100
169
313
human mtDNA
 
 
JX669424.1
100
169
313
human mtDNA
 
Sequence6 S33 TAP1 vs Human G+T
 
NC_012920.1
99.41
169
307
human mtDNA
 
Sequence6 S33 TAP1 vs RefSeq
 
NC_011137.1
100
169
313
human mtDNA
 
********
********
*****
*****
*****
********
 
S43 TAP1
Sequence7 S43 TAP1 vs Nucleotide
Unknown
 
Sequence7 S43 TAP1 vs RefSeq
Unknown
 
Sequence7 S43 TAP1 vs Genomes
Unknown
 
Sequence7 S43 TAP1 vs TSA
Unknown
 
 
Sequence7 S43 TAP1 vs WGS
Unknown
 
 
Sequence7 S43 TAP1 vs Mouse G + T
Unknown
 
 
Sequence7 S43 TAP1 vs Human G + T
Unknown
 
 
Nine additional databases searched.**
Unknown
 
********
********
********
 
S44 TAP1
Sequence8 S44 TAP1 vs Nucleotide
Unknown
 
Sequence8 S44 TAP1 vs RefSeq
Unknown
 
Sequence8 S44 TAP1 vs Genomes
Unknown
 
Sequence8 S44 TAP1 vs TSA
Unknown
 
 
Sequence8 S44 TAP1 vs WGS
Unknown
 
 
Sequence8 S44 TAP1 vs Mouse G + T
Unknown
 
 
Sequence8 S44 TAP1 vs Human G + T
Unknown
 
 
Nine additional databases searched.**
Unknown
 
********
********
********
 
S39b TAP1
Sequence9 S39b TAP1 vs Nucleotide
 
AB528393.1
100
373
689
human
 
********
********
*****
*****
*****
********
S35 & S37 My16
Sequence1 S35 My16 & Sequence2 S37 My16 vs Nucleotide
 
AC005163.3
100
298
551
human
 
BK001410.1
100
298
551
human
 
********
********
*****
*****
*****
********
S26 Amel X
Sequence10 S26 Amel X vs Nucleotide
Unknown
 
S26 Amel X vs Genomes
Unknown
 
S26 Amel X vs TSA
Unknown
 
S26 Amel X vs RefSeq
Unknown
 
S26 Amel X vs WGS
Unknown
 
 
S26 Amel X vs Mouse G + T
Unknown
 
 
S26 Amel X vs Human G + T
Unknown
 
 
Nine additional databases searched.**
Unknown
 
********
********
********
 
S35 Amel X
Sequence12 S35 AmelX vs Nucleotide
 
AY694861.1
99.19
619
1125
human
 
NG_012494.1
99.51
610
1116
human
 
********
********
*****
*****
*****
********
S35 Amel Y
Sequence13 S35 AmelY vs. Nucleotide
 
NG_008011.1
99.86
701
1290
human
 
********
********
*****
*****
*****
********
 
S43 "Amel"
Sequence14 S43 “Amel” vs Nucleotide
Unknown
 
Sequence14 S43 “Amel” vs RefSeq
Unknown
 
Sequence14 S43 “Amel” vs Genomes
Unknown
 
Sequence14 S43 “Amel” vs TSA
Unknown
 
 
Sequence14 S43 “Amel” vs WGS
Unknown
 
 
Sequence14 S43 “Amel” vs Mouse G + T
Unknown
 
 
Sequence14 S43 “Amel” vs Human G + T
Unknown
 
 
Nine additional databases searched.**
Unknown
 
********
********
********
 
S44 & S43 "Amel"
Sequence15 S44 & Sequence16 S43 "Amel" vs Nucleotide
Unknown
 
Sequence15 S44 & Sequence16 S43 "Amel" vs RefSeq
Unknown
 
Sequence15 S44 & Sequence16 S43 "Amel" vs Genomes
Unknown
 
Sequence15 S44 & Sequence16 S43 "Amel" vs TSA
Unknown
 
 
Sequence15 S44 & Sequence16 S43 "Amel" vs WGS
Unknown
 
 
Sequence15 S44 & Sequence16 S43 "Amel" vs Mouse G + T
Unknown
 
 
Sequence15 S44 & Sequence16 S43 "Amel" vs Human G + T
Unknown
 
 
Nine additional databases searched.**
Unknown
 
********
********
********
 
S44 "Amel"
Sequence17 S44 "Amel" vs Nucleotide
Unknown
 
Sequence17 S44 "Amel" vs TSA
Unknown
 
Sequence17 S44 "Amel" vs RefSeq
Unknown
 
Sequence17 S44 "Amel" vs Genomes
Unknown
 
 
Sequence17 S44 "Amel" vs WGS
Unknown
 
 
Sequence17 S44 “Amel” vs Mouse G + T
Unknown
 
 
Sequence17 S44 “Amel” vs Human G + T
Unknown
 

 

Table 1. * The accession renamed 6-13-13 as domestic ferret (Mustela putorius furo), a subspecies of European polecat (Mustela putorius).  ** Every NCBI database was searched.  Some have redundant data. Abbreviations: Query sequence and NCBI database underlined.  Nucleotide = Nucleotide Collection. RefSeq = Reference Genomic Sequence.  TSA = Transcriptome Shotgun Assembly.  WGS = Whole Genome Shotgun –Contigs.  Human G+T, Human Genomic plus Transcript.  Mouse G + T, Mouse Genomic plus Transcript. Genomes, NCBI Genomes (Chromosome).  %ID, percentage of base pairs which align.  Length, number of consecutive base pairs which align.  Score, See [3,6,7] and http://www.ncbi.nlm.nih.gov/BLAST/tutorial/Altschul-1.html.  ID, identity of matching sequence.  Unknown, no matches, “No significant similarity found.” Different from FTS (failure to sequence) in Table 2.  Dog, definitely Canis, but could be wolf or coyote.  “Amel”, no X or Y was reported by Ketchum et al.

 

Beginning at the top for Samples 25 & 26 (hair and flesh, respectively, of same sample), we agree with the SeqWright interpretations of human for the genes MC1R (Melanocortin 1 Receptor) and PNLIP (Pancreatic Lipase), as indicated in their table near the top of the webpage, also found by Ketchum to be human with accession numbers identical to ours.   The M16 (My16, MYH16: Myosin 16 Heavy Chain) sequence is also human, but had no interpretation by Ketchum on the website.

Moving down to the next gene, Amel (Amelogenin) X , SeqWright called the sequence “strange,” but we found it matched dog (Canis lupus familiaris) sequences 99.46 %ID for all 186 bases in the Genomic Reference Sequence (RefSeq), Genomes(chromosome), and Whole Genome Shotgun (WGS) databases, and nothing in the Nucleotide database., which we suspect was the only database SeqWright queried.  We do not agree with the mixed human and dog interpretation of Ketchum on the website based on her Nucleotide Collection (only) alignment of only 73 and 89 bases, respectively, with 88-91 %ID.  Accession dates for our best matches were 2003 and 2006, well before the initiation of the Ketchum et al. study and the SGP website, so obviously these databases were not searched by her. Ketchum said it was “loosely aligned” with dog.”   We found 99.46 %ID alignment with dog, which, given the hybrid nature of dogs, is very well aligned.  IT’S A DOG, wolf or coyote.  Curiously, this sequence does not align with the Amel X sequence (Sequence 10) in Supplementary Data 3 mentioned below, which is also listed as failure to sequence in the Ketchum paper Table 4 (See Table 2 below).  We found no matches for Sequence 10.  Dog, FTS, and unknown all for the same gene?  It cannot be a single species.

Next on the list is Amel Y Exon 2.  Called a “strange sequence” by SeqWright and “unknown” by Ketchum on the website and in the paper Table 4, it is anything but. IT’S A BEAR,   not an “unknown” species.  The sequence matches polar bear (best) and giant panda in three databases with 100 and 95.27 %ID, respectively.  Again Ketchum and associates came up with Nucleotide Collection ONLY matches of only 77%ID for Rhesus monkey and 83 %ID for European polecat(a ferret).  The BLAST™ output screen reproduced on the webpage confirms the database as Nucleotide Collection: “Database Name nr” and “Description nucleotide collection (nt),” and we got the same result of European polecat that Ketchum did for that database ALONE.  No other searches were mentioned or implied on the website.   The primers S1100424.DND-CP29 and S1100424.DND-CP30 from Supplementary Data 12 of Ketchum et al. for human Amel Y Exon 2 would not amplify the sequence on the webpage, nor does this sequence match (“No significant alignment found”) the predicted human amplicon for this gene using these primers. (See below, Table 3.)

In California a wild polar bear or giant panda is very unlikely.  However, due to their threatened status, the polar bear (Ursus maritimus) and the giant panda (Ailuropoda melanoleuca) have far more sequence data – complete genomes[8,9]- in the databases than the black bear (Ursus americanus), the previously proven identity of the sample.[2,3,4]   The match to polar bear is better than that to panda, because the polar and black bears are in the same genus.  The sole member of its genus, the panda is a more distant relative (22-24 MyBP split), and therefore is often called a “living fossil”. 

Throughout our previous study [3] we found 12 rules to apply to this kind of species identification.  Three are especially applicable here:

1. “One cannot match what is not in the database.”

4. “Good matches to closely related species at these levels (genus, family) may indicate that the species of interest is relatively under-represented in the database compared to its kin.” 

9. “Nucleotide, Genome plus Transcription, Reference Sequence and Shotgun Assembly databases should be searched.” 

These principles must always be kept in mind whenever a search results in no matches or good but not exact matches (~95-99%ID).

Finally, for the HAR1 (Human Accelerated Region 1) gene Ketchum on the SGP website shows “No significant similarity found” in the reproduced BLAST™ output screen and called it a “Completely unknown sequence.”  Again, only the Nucleotide Collection was searched, or at least it was the only output presented on the website.  We found matches to polar bear and giant panda in three OTHER databases with 98.08 and 92.86 %ID, respectively.  It matches nothing else; the next closest, but poor, matches are the Weddell seal 83%ID and the northern Pacific walrus 87%ID; human is 58-th place on list by score.  Interestingly, the scores correlated well with the known phylogeny of Carnivora, thereby reinforcing our match to bear.  As above, black bear is the more likely actual origin of the sample.  IT’S A BEAR.  If the %ID is not  good enough for an exact species match (>99%) and/or a matching species is way out of its range, such a finding should not be dismissed as “impossible”; here it’s a clue that a bear is likely, but more data is needed to decide which bear.  Similar to the case of Amel Y Exon 2 above, we could not align the HAR1 sequence from the website with the predicted human amplicon using the HAR primers in Supplementary Data 12, indeed with anything human, so we do not believe this sequence could have been amplified by these primers.  Its gene is unknown.

It must be mentioned here that as this paper was nearing completion, new polar bear data was added to the databases (so indicated in Table 1).  However, the giant panda data were there well before the publication of the Ketchum et al. paper.  Had they searched any one of three additional databases in Table 1 they would have found a good match to the giant panda for the Amel Y Exon 2, and HAR1 genes.  

This concludes the discussion of the SGP website data from S25 & S26.

On to the second part of Table 1; these are sequences from the Ketchum et al. paper, Supplementary Data 3.

S26 & S35 TAP1 (Antigen Peptide Transporter 1) were identical sequences.  We found 100%ID human for 482 bases, which agrees with SeqWright for S26. S39b was 100% ID human as well. There is no specific mention of S35 or S39b TAP1 in the text of the Ketchum paper or on the SGP website (yet it’s in Supplementary Data 3). 

Table 7 in Ketchum et al. reports “Unknown” for TAP1 sequences of Samples 10, 33, 43, and 44.  We convincingly matched dog (99.18%ID) and human mitochondrial (100%ID), for S10 and S33, respectively. S33 positions 1-313 align with the minus (reverse) strand of S44, positions 313 to 1, neither of which found a match.  Mitochondrial sequences matched S33 positions 424- 592(end).   A mitochondrial sequence should not be primed and sequenced by TAP1 nuclear gene primers, nor should it be part of a strand with an “unknown” segment, since the entire human mitochondrial sequence is known and in the database.  Also, a plus strand matching a minus strand in two samples presumably amplified and sequenced with the same primers is bizarre. Hence, as discussed below, multiple, uncontrolled amplification and sequencing must have occurred with S33 and, by implication, possibly S44.  S10 and S43 do not align as claimed by Ketchum et al. Table 7, otherwise S43 would have matched dog too (Table 1).  S43 and S44 had no matches, which is consistent with Ketchum et al. Table 7 (“Unknown”).

S35 and S37 My16 sequences are identical and match human 100%ID.  The Ketchum et al. paper states, “…all DNA samples that successfully amplified yielded results consistent with human ad aligned with the human reference sequence…”  However, we are not told which samples those are.

Curiously, the S26 Amel X sequence in Supplementary Data 3 (Sequence 10) does not match the S26 Amel X sequence above on the website (dog), nor does it match any sequence in any of the databases.   No explanation was given in either location.

S35 Amel X is  human, but was listed in Table 4 of the Ketchum paper as failure to sequence and in Ketchum Table 5 as having X allele dropout (Y only).  S35 Amel Y (unknown exon) is also human, as were three of its 5 exons in Ketchum et al. Table 4.  The other two exons failed to sequence.

Sequence 14 S43 “Amel”, Sequence 15 S44 “Amel”, Sequence 16 S43 “Amel”, and Sequence 17 S44 “Amel” did not match  any sequence in the databases.  No X, Y or exon designation is given in Supplementary Data 3.  Curiously, Sequences 15 and 16 are identical, and Sequences 14 and 17 match except for 12 extra bases at the end of 14.  These sequences are not adequately labeled.  Are some Amel X and others Amel Y?

A summary Table 2 of all the above results in Table 1 follows.  Other related results from the Ketchum et al. paper (Tables 4 and 7) are added for comparison.


 

Table 2.  Summary of Gene Sequence Identification

 
Sample
S25 & 26
S10
S33
S35
S37
S39b
S43
S44
Gene
Source
MC1R
SGP
H
KP
H
H
H
HSGP
H
 
PNLIP
SGP
H
KP
HSGP
H
 
M16 (My16)
SGP
NI
KP
H/a ?
H/a ?
H/a ?
H/a ?
H/a ?
H/a ?
H/a ?
H/a ?
HSGP
H
H
H
 
Amel X
SGP
SS,UNK
KP
FTS
H
FTS
FTS
HSGP
D
HKP
UNK
 
 
H
 
 
UNK?
UNK?
 
Amel Y*
SGP Ex1,Ex2
FTS,UNK
KP Ex1
FTS
 
FTS
FTS
 
FTS
FTS
FTS
KP Ex2
UNK
H
H
H
H
H
KP Ex3(T4,T7)
FTS,--
--,FTS
FTS,FTS
FTS,--
 
FTS,--
FTS,UNK
FTS,UNK
HSGP Ex2
PB
KP Ex4/5
H
 
H
H
 
FTS
FTS
H
KP Ex8
H
 
H
H
 
FTS
FTS
H
SGP Ex4/5,Ex8
H,H
 
 
 
 
 
 
 
HKP Ex?
 
 
 
H
 
 
UNK?
UNK?
 
HAR1
SGP
SS,UNK
KP
HSGP
PB
 
TAP1
SGP
KP(T7)
UNK
UNK
UNK
UNK
HKP
H
D
HM/p
H
 
H
UNK
UNK

 

Table 2.  * FTS for Amel Y may be because sample is female.  Abbreviations: SGP, results on webpage.   KP, results in Ketchum paper.  HSGP results of Hart, sequence from SGP webpage (Table 1 above).  HKP results of Hart, sequence from Ketchum Paper, Supplementary Data 3(Table 1 above).  H, human.  NI sequence with No Interpretation.   SS, “strange sequence”(SeqWright).  H/a ?, may be among aligned human sequences.  UNK, unknown, i.e. no matches.  UNK?, X or Y not specified.  FTS, failure to sequence.  --, not listed.  Ex1, exon 1.  Ex2, exon 2, etc. T4, Table 4 in Ketchum et al.  T7, Table 7 in Ketchum et al.  D, dog, definitely Canis but could be wolf or coyote.  PB, polar bear.  HM/p, human mitochondrial/partial match.  Results of this study are in aqua rows.  Most remarkable new findings are in yellow.    

So many contradictory results and unknown, FTS, non-primate species (dog, bear), and erroneous (mtDNA) sequences make this data set look more like mixed or mistaken sample provenance or compromised sample handling, sample storage, data handling, or sequencing protocol than anything new.  Clean, single-species samples should amplify across all genes for the species.  However, if multiple species are present, attempting to amplify one in low concentration may result in amplification of other species in the sample, but may not result in sufficient amplification of the target species, depending on target DNA concentration relative to other species present, primers, degree of degradation, and conditions.  Such might be the case if the S26 bear sample were contaminated with dog and human DNA.  Partially degraded samples may also be self-primed by fragments of degradation and backfolding of single strands, observed by Ketchum et al.[10, 11] The mixed single strand-double strand electron photomicrograph of S26 DNA in Fig. 12 of Ketchum et al. more likely represents degradation rather than anything novel and presents just such a morphology conducive to self-priming in which the double stranded sections act as primers for the contiguous single strands and/or the backfolded single strands are self-primed.  But the resulting sequences are not the ones expected from the addition of carefully designed  primers; rather, they may be from unexplored regions of the genome and therefore “unknown” when searched against the databases.  We suspect this to be the case for the Supplementary Data 3 Sequences 7, 8, 10, 14, 15, 16, and 17 in Table 1.  Even “complete” nuclear genomes may actually only be 80-95% complete.   Most of the remainder is just the sort of “junk sequence” that may not be in the databases, because it is not on an important gene.  Finally, we are not convinced by the Ketchum et al. conclusion that degradation did not occur in S26 because a DNA electron photomicrograph of a “degraded human DNA control sample (Figure 12, panel C),” showed no “single-stranded gaps and single-stranded ends” as did S26. Simply stated, because the control was not the same type of sample as S26 and was not exposed to the same conditions for the same amount of time, such a result is not relevant, and therefore their conclusion is not valid.  Logically, it’s a false contrapositive, because the original premise that a degraded sample shows no single DNA strands is not true in every case.    

The submitter of S26, Justin Smeja, used his dog to locate the sample and probably did not take precautions against contamination by himself or the dog.[12]  This is the likely cause of the mixed results for S25-26 in Table 1.  Another hypothesis might be that the S26 sample is actually the remains of a sasquatch (possibly one that Smeja shot) devoured (and possibly regurgitated) by a bear and possibly also a coyote or wild dog.  This hypothesis could be tested on the original S26 sample using better separation and purification techniques. We did, however, find previously (Table 1 in ref. [3]) that Ursidae matched consistently better than any other species (including human and dog) in the database search of the whole nuclear sequence (called “whole genome” by Ketchum et al.), so we believe that the major species in S26 is a bear.

To determine whether nonhuman species would amplify and sequence with the human Amel X and Amel Y primers used by Ketchum et al., we predicted  human amplicons from the primers in Supplementary Data 12 of Ketchum et al. and intervening bases and searched these strings with BLAST™ for matches to other species. The primers were first aligned against human reference sequences (e.g. NW_001842425.2) of the appropriate chromosome.  The extreme base positions at the far end of each primer (5’ on forward, 3’ on reverse) then defined the amplicon, which in every case was on the correct chromosome and matched the length listed in Supplementary Data 12.  The string of bases between these extreme positions was then searched against the Reference Genomic Sequences Database.  See Table 3. As a check we also used Primer-BLAST™, and obtained the same results. Amel X produced identical results for chimpanzee (Pan troglodytes) and pygmy chimpanzee (Pan paniscus); these should amplify and sequence.  Gorilla (Gorilla gorilla gorilla) and the northern white-cheeked gibbon (Nomascus leucogenys) are questionable with four total primer mutations.  In any case, a human match for Amel X indicates a species more recent than any of the great apes.  Similarly, only the chimpanzee (Pan troglodytes) aligned the primers at the proper locations and produced an amplicon of the correct length (Supplementary Data 12) on the correct gene for Amel Y exons 1, 2, 4/5, and 8.  Hence, any primate between chimpanzee and human on the Evolutionary Tree of Life would be amplified and sequenced at exons 1, 2, 4/5, and 8, and no other, more distant, species of primate or nonprimate would be amplified and sequenced with these primers.  Therefore, we are assured that FTS for any of these four exons in Table 2 above cannot be due to an unknown primate or human hybrid more recent than the chimpanzee; they must signal a more distant species, primate or nonprimate. UNK (unknown sequence) is addressed below.   Conversely, a “human” match (“H” in Table 2) to an amplicon from these four pairs of primers can only be a human or some human-like primate more recent than the chimpanzee, nothing else.  Not even the gorilla, the pygmy chimpanzee, gibbons, or the orangutan (Pongo abelii) would align or sequence with these four primer pairs. (too many mutations vs. the primers.  Interestingly, the green monkey (Chlorocebus sabaeus) matches these Amel Y amplicons well but on the X chromosome. No other monkeys or nonprimates come close to matching any of these human Amel X and Amel Y amplicons. Table 3 shows search results which demonstrate these points.         


 

 
 
 
Conclusions

It is obvious that Ketchum et al. did not observe basic principles of DNA database search/match as outlined in [3], and they missed some matches that were in databases other than the Nucleotide Collection.  These limited searches resulted in incorrect or nebulous species identifications and false conclusions.  We found this previously.[3]  Further, the conflicting results for different genes of the same sample (even three different species for S25/S26) raises questions about sample provenance and handling.  Such mixed results DO NOT imply a new species, as suggested by Ketchum on the SGP website.  Any new primate species or human/primate hybrid must match human and/or some primate above 95%ID.   As an example, we matched polar bear (best) and giant panda when there were no black bear data.  Also, both we and Ketchum matched European polecat/domestic ferret (only 83%ID however) when only the Nucleotide Collection was searched.  It’s a carnivore distantly related to bears.  Genetic hybrids seem unlikely, because they should at least resemble some one or more primates or amplify and sequence if they are more recent than the chimpanzee.  However, everything above considered, S33, S35 and S37 might actually be worth pursuing.  They come close to human, though they have some FTS.  S35 has a pristine mtDNA haplogroup H10e with no extra mutations.[5]  S37 has little gene data but also appears to be human with haplogroup H3 and only two extra mutations.[5]  At the other end S43 and S44 produced few matches.  Also, S44 had 17 extra mtDNA mutations, the worst of 18 samples and with only one chance in 1,606,186,760 of being matrilinearly related to modern humans.[5]  Similarly, S39b had 12 mtDNA extra mutations with a probability of one chance in 162,224 of being from the human population.[5]  Samples S10, S25/26, 39b, S43 and S44 have little promise as bigfoot/sasquatch candidates.

Lessons learned from this study are:

Search all databases.

If unsuccessful, look for members of the same genus or family.

Failing these, the unknown sequence must be VERY different from ANY whole genome sequence AND be from an unexplored region of an incompletely sequenced genome (e.g. an uncommon or unimportant gene) or from an unsequenced genome, OR there must be something wrong with the data, which could be a “junk sequence” not previously identified and not amplified by the primers.

The phylogeny of the planet’s life is sufficiently well known, and the species are sufficiently related to one another through the Evolutionary Tree of Life, and their DNA is sufficiently sequenced that a totally new form of life with NO match to ANY existing species is very unlikely at this time, especially when common/important genes are “sequenced.”  Certainly such a form is not likely to be a primate (all of which are closely related) of the most studied mammalian order.   Therefore, purported primate sequences which yield no matches must be considered to be very likely based on bad or junk data.  When samples fail to align (FTA, FTS), either the wrong primers were used, PCR conditions were inappropriate, or there was insufficient target DNA in the sample.   When samples align and sequence, especially with primers for important genes, but do not match any known species even remotely, the sequences are suspect and may be from degraded or intractable mixtures.  Claims of a new primate species under either of these circumstances – FTA/FTS or unknown sequence - are totally unwarranted. 

When we began our studies a year and a half ago, we had high hopes for the discovery of a new hominid species.  It was the natural reaction of a lifelong amateur naturalist.  Now, after three exhaustive studies[3,5] of the Ketchum et al. results and conclusions[1], we are left with the disappointing overarching conclusion that the existence of bigfoot or sasquatch was not proven by them.  This does not mean that bigfoot/sasquatch does not exist, that the tireless efforts of many field workers are wasted effort, that laboratory protocols are invalid, or that all the reports of many observers are inaccurate.  Hopefully, application of our methodology will avoid future mistakes in interpretation if good DNA data is eventually obtained.  

Note in Passing

An early version of Table 1 was shared privately with Dr. Ketchum soon after the posting of the gene sequences on the SGP webpage, with the hope that admitting her mistakes might avoid more controversy and similar mistakes in the future.  No response was received publicly or privately, so this author had no responsible scientific alternative but to make these mistakes public in this paper.

Conflict of Interest

The author declares no conflict of interest.

Acknowledgement
Thanks go to the Sasquatch Genome Project for making their sequences available online.  The author received no financial support for this work.



References

[1]        Ketchum, M. S. et al. Novel North American Hominins: Next Generation Sequencing of Three Whole Genomes and Associated Studies. DeNovo, 2013, 1:1, Online only: http://sasquatchgenomeproject.org/view-dna-study/

[2]        Khan, T.; White, B.  Final Report on the Analysis of Samples Submitted by Tyler Huggins, Wildlife Forensic DNA Laboratory Case File 12-019; Trent University Oshawa: Peterborough, Ontario, Canada, 2012.

 

[3]        Hart, H. V.  Methodology and New Metrics for Distinguishing Related Species from Incomplete nuDNA. Unpublished. http://bigfootforums.com/index.php/topic/40487-the-ketchum-report-part-3/page-30?hl=ketchum#entry837515

 

 

[4]        Sykes, B. C.; Rhettman A.; Mullis, R. A.;  Hagenmuller, C.; Melton, T. W.;   Sartori, M. Genetic Analysis of Hair Samples Attributed to Yeti, Bigfoot and Other Anomalous Primates.  Proc. R. Soc. B, 2014, 281, 20140161.

 

[5]       Hart, H. V.  “But the mtDNA Sequences are all Human…”  Really?  https://www.facebook.com/groups/smartbigfoot/    and
            http://bigfootforums.com/index.php/topic/40487-the-ketchum-report-part-3/page-31

[6]        Altschul, S. F.; Gish, W.; Webb, M.; Meyers, E. W.; Lipman, D. J.  Basic Local Alignment Search Tool.  J. Mol. Biol., 1990, 215 (3), 403-410.  

[7]        Madden, T. The BLAST Sequence Analysis Tool, In The NCBI Handbook; McEntyre J; Ostell J., Eds.; National Center for Biotechnology Information: Bethesda, MD, 2003; http://www.ncbi.nlm.nih.gov/books/NBK21097/.

[8]        Li, R., et al.  The Sequence and De Novo Assembly of the Giant Panda genome.  Nature, 2010, 463, 311-317.

[9]        Liu, S., et al.  Population Genomics Reveal Recent Speciation and Rapid Evolutionary
            Adaptation in Polar Bears.  Cell, 2014, 157 (4), 785-794.    

 

[10]      Levin, H. L. A Novel Mechanism of Self-primed Reverse Transcription Defines a New Family of Retroelements.  Mol. Cell. Biol. 1995, 15(6), 3310-3317.

[11]      Whitcombe, D. M., Theaker, J., Gibson, N. J., Little, S.  Methods for detecting target nucleic acid sequences.  US Patent 6326145. 2001.

[12]      Greene, M. D., Sasquatch for Sale: Death, DNA, and Duplicity, San Bernardino, CA, 2014, p. 232.





No comments:

Post a Comment