Specific
Gene DNA Sequences:
The Ketchum et al.
Mistakes Revealed
By
Haskell V. Hart August 26, 2014
©
2014, Haskell V. Hart, all rights reserved.
May be reproduced for individual use only.
Abstract
Specific nuclear gene
DNA sequences from the Ketchum et al.
study of purported bigfoot/sasquatch samples are identified for species and
compared to their results. Serious
errors in interpretation were found in the latter, which were tied to
incomplete searches of all available DNA databases. Never the less, a few
human-like samples might merit further investigation. The overarching conclusion of this and two
previous studies is that bigfoot/sasquatch was not proven to exist by Ketchum et al.
Introduction
The Ketchum paper,
“Novel North American Hominins: Next Generation Sequencing of Three Whole
Genomes and Associated Studies,”[1] gives no details of BLAST™ search/match strategy or results, but claims “homology”
to humans for various DNA sequences. Previous
work showed that two of the three nuDNA sequence interpretations were
incorrect, with best matches to a bear and a dog,[2, 3,4] and that eight of 18
complete mtDNA sequences are statistical outliers with less than 1% probability
of occurring in the human population.[5]
The more recent Sasquatch Genome Project (SGP) website, www.sasquatchgenomeproject.org,
reveals more mistakes of omission, this time with enough results and narrative
to clearly demonstrate where Ketchum et
al. went wrong. The present paper
was written to explain these mistakes and show that had basic principles[3] of
this kind of search/match been followed, publication of such a paper[1] with unsupported
claims would have been avoided.
Computer
Methods
We used the same search/match
software that was used by Ketchum et al.,
BLAST™ [6,7], http://blast.ncbi.nlm.nih.gov/Blast.cgi,
for matching (aligning) a sequence with the databases of known sequences, which
are maintained by the National Center for Bioinformation (NCBI), http://www.ncbi.nlm.nih.gov/. Sequences were taken from the SGP website,
“Supplemental Raw Data” tab, and from “Supplementary Data 3” in the original
paper[1], also available on the SGP
website “View DNA Study” tab. The SGP
sequences were cut and pasted into a text file, and a title row beginning with
“>” was added to conform to FASTA format as required by BLAST™. The Supplementary Data 3 sequences were
already in FASTA format, ready to paste as BLAST™ input. A variation, Primer-BLAST™, was used to
search Amel X and Amel Y amplicons (including primers) against possible
coincidental, nonhuman matches. Online
help adequately described the input requirements.
All NCBI databases and BLAST™
software are available free to the public, making it possible for anyone to
validate the results presented below.
They would not need to be a geneticist; they simply need to spend some
time learning the nuances of the search software and the databases..
Results
and Discussion
The “Supplemental Raw
Data” tab on the SGP website opens up a dissertation by Dr. Ketchum on validating
sequence electropherograms and matching specific gene sequences with species in
the database, obviously only the “Nucleotide Collection” database, as our
results will show. Its purpose was “to put to rest any questions
about our study and the data quality.” Our
issue is with the interpretation of some of these sequences, the
identifications.
Table 1 presents our
search/match results for the Ketchum sequences in the website’s Supplemental
Raw Data (top part) and in the Ketchum et
al. paper[1] (bottom part).
Table
1. Search/Match Results for Specific Gene
Sequences
Gene
|
Search/Accession Match
|
%ID
|
Length
|
Score
|
ID
|
|||||
From SGP
Website
|
||||||||||
MC1R
|
SGP S25-26 MC1R vs. Nucleotide
|
|||||||||
JN575070.1
|
86.02
|
794
|
852
|
black bear
|
||||||
AY884206.1
|
83.58
|
615
|
577
|
giant panda
|
||||||
AB598380.1
|
100
|
824
|
1522
|
human
|
||||||
********
|
********
|
*****
|
*****
|
*****
|
********
|
|||||
PNLIP
|
SGP S25-26 PNLIP vs. Nucleotide
|
|||||||||
NG_023311.1
|
100
|
215
|
398
|
human
|
||||||
AL731653.1
|
100
|
215
|
398
|
human
|
||||||
********
|
********
|
*****
|
*****
|
*****
|
********
|
|||||
M16(My16)
|
SGP S25-26 M16 vs. Nucleotide
|
|||||||||
AC005163.3
|
100
|
235
|
435
|
human
|
||||||
BK001410.1
|
100
|
235
|
435
|
human
|
||||||
********
|
********
|
*****
|
*****
|
*****
|
********
|
|||||
Amel X
|
SGP S25-26 AmelX vs Nucleotide
|
Unknown
|
||||||||
SGP S25-26 AmelX vs RefSeq
|
||||||||||
NC_006585.3
|
99.46
|
186
|
337
|
dog
|
||||||
NW_003726054.1
|
99.46
|
186
|
337
|
dog
|
||||||
SGP S25-26 AmelX vs Genomes (chromosome)
|
||||||||||
NC_006585.3
|
99.46
|
186
|
337
|
dog
|
||||||
SGP S25-26 Amel X vs WGS
|
||||||||||
AOCS01003912.1
|
99.46
|
186
|
337
|
dog
|
||||||
AAEX03002445.1
|
99.46
|
186
|
337
|
dog
|
||||||
AACN010745680.1
|
99.46
|
186
|
337
|
dog
|
||||||
********
|
********
|
*****
|
*****
|
*****
|
********
|
|||||
Amel Y Exon2
|
SGP S25-26 Amel Y Ex2 vs Nucleotide
|
|||||||||
XM_004792766.1
|
83.06
|
484
|
412
|
domestic ferret /
European Polecat *
|
||||||
SGP S25-26 Amel Y Ex2 vs RefSeq
|
||||||||||
|
NW_007907173.1 (NEW August 4, 2014)
|
100
|
463
|
856
|
polar bear
|
|||||
NW_003219866.1
|
95.27
|
465
|
734
|
giant panda
|
||||||
SGP S25-26 Amel Y Ex2 vs WGS
|
||||||||||
|
AVOR01061180.1 (NEW August 12, 2014)
|
100
|
463
|
856
|
polar bear
|
|||||
ACTA01188137.1
|
95.27
|
465
|
734
|
giant panda
|
||||||
SGP S25-26 Amel Y Ex 2 vs Genomes
|
||||||||||
|
AVOR01061180.1 (NEW August 12, 2014)
|
100
|
464
|
856
|
polar bear
|
|||||
ACTA01188137.1
|
95.27
|
465
|
734
|
giant panda
|
||||||
SGP S25-26 Amel Y Ex2 vs TSA
|
||||||||||
|
AVOR01061180.1 (NEW August 12, 2014)
|
100
|
463
|
856
|
polar bear
|
|||||
|
ACTA01188137.1
|
95.27
|
465
|
734
|
giant panda
|
|||||
********
|
********
|
*****
|
*****
|
*****
|
********
|
|||||
HAR1
|
SGP S25-26 HAR1 vs Nucleotide
|
Unknown
|
||||||||
|
SGP S25-26 HAR1 vs RefSeq
|
|||||||||
NW_007907114.1 (NEW August 4, 2014)
|
98.08
|
208
|
363
|
polar bear
|
||||||
NW_003219288.1
|
92.86
|
238
|
346
|
giant panda
|
||||||
SGP S25-26 HAR1 vs Genomes
|
||||||||||
|
NW_007907114.1 (NEW August 4, 2014)
|
98.08
|
208
|
363
|
polar bear
|
|||||
SGP S25-26 HAR1 vs TSA
|
Unknown
|
|||||||||
SGP S25-26 HAR1 vs WGS
|
||||||||||
|
AVOR01033236.1 (NEW August 12, 2014)
|
98.08
|
208
|
363
|
polar bear
|
|||||
ACTA01148086.1
|
92.86
|
238
|
346
|
giant panda
|
||||||
|
||||||||||
From
Supplementary Data 3 of the Ketchum et al. paper
|
||||||||||
S26 & S35 TAP1
|
Sequence3 S26 TAP1 vs Nucleotide
|
|||||||||
AB528393.1
|
100
|
482
|
891
|
human
|
||||||
********
|
********
|
*****
|
*****
|
*****
|
********
|
|||||
S10 TAP1
|
Sequence5 S10 TAP1 vs Nucleotide
|
|||||||||
AC190393.6
|
95.77
|
189
|
307
|
dog
|
||||||
AC188661.8
|
95.77
|
189
|
307
|
dog
|
||||||
Sequence5 S10 TAP1 vs RefSeq
|
||||||||||
NC_006588.3
|
99.18
|
245
|
444
|
dog
|
||||||
NW_003726065.1
|
99.18
|
245
|
444
|
dog
|
||||||
********
|
********
|
*****
|
*****
|
*****
|
********
|
|||||
S33 TAP1
|
Sequence 6 S33 TAP1 vs Nucleotide
|
|||||||||
KF523403.1
|
100
|
169
|
313
|
human mtDNA
|
||||||
|
JX669424.1
|
100
|
169
|
313
|
human mtDNA
|
|||||
Sequence6 S33 TAP1 vs Human G+T
|
||||||||||
NC_012920.1
|
99.41
|
169
|
307
|
human mtDNA
|
||||||
Sequence6 S33 TAP1 vs RefSeq
|
||||||||||
NC_011137.1
|
100
|
169
|
313
|
human mtDNA
|
||||||
********
|
********
|
*****
|
*****
|
*****
|
********
|
|||||
S43 TAP1
|
Sequence7 S43 TAP1 vs Nucleotide
|
Unknown
|
||||||||
Sequence7 S43 TAP1 vs RefSeq
|
Unknown
|
|||||||||
Sequence7 S43 TAP1 vs Genomes
|
Unknown
|
|||||||||
Sequence7 S43 TAP1 vs TSA
|
Unknown
|
|||||||||
|
Sequence7 S43 TAP1 vs WGS
|
Unknown
|
||||||||
|
Sequence7 S43 TAP1 vs Mouse G + T
|
Unknown
|
||||||||
|
Sequence7 S43 TAP1 vs Human G + T
|
Unknown
|
||||||||
|
Nine additional databases searched.**
|
Unknown
|
||||||||
********
|
********
|
********
|
||||||||
S44 TAP1
|
Sequence8 S44 TAP1 vs Nucleotide
|
Unknown
|
||||||||
Sequence8 S44 TAP1 vs RefSeq
|
Unknown
|
|||||||||
Sequence8 S44 TAP1 vs Genomes
|
Unknown
|
|||||||||
Sequence8 S44 TAP1 vs TSA
|
Unknown
|
|||||||||
|
Sequence8 S44 TAP1 vs WGS
|
Unknown
|
||||||||
|
Sequence8 S44 TAP1 vs Mouse G + T
|
Unknown
|
||||||||
|
Sequence8 S44 TAP1 vs Human G + T
|
Unknown
|
||||||||
|
Nine additional databases searched.**
|
Unknown
|
||||||||
********
|
********
|
********
|
||||||||
S39b TAP1
|
Sequence9 S39b TAP1 vs Nucleotide
|
|||||||||
AB528393.1
|
100
|
373
|
689
|
human
|
||||||
********
|
********
|
*****
|
*****
|
*****
|
********
|
|||||
S35 & S37 My16
|
Sequence1 S35 My16 & Sequence2 S37 My16 vs Nucleotide
|
|||||||||
AC005163.3
|
100
|
298
|
551
|
human
|
||||||
BK001410.1
|
100
|
298
|
551
|
human
|
||||||
********
|
********
|
*****
|
*****
|
*****
|
********
|
|||||
S26 Amel X
|
Sequence10 S26 Amel X vs Nucleotide
|
Unknown
|
||||||||
S26 Amel X vs Genomes
|
Unknown
|
|||||||||
S26 Amel X vs TSA
|
Unknown
|
|||||||||
S26 Amel X vs RefSeq
|
Unknown
|
|||||||||
S26 Amel X vs WGS
|
Unknown
|
|||||||||
|
S26 Amel X vs Mouse G + T
|
Unknown
|
||||||||
|
S26 Amel X vs Human G + T
|
Unknown
|
||||||||
|
Nine additional databases searched.**
|
Unknown
|
||||||||
********
|
********
|
********
|
||||||||
S35 Amel X
|
Sequence12 S35 AmelX vs Nucleotide
|
|||||||||
AY694861.1
|
99.19
|
619
|
1125
|
human
|
||||||
NG_012494.1
|
99.51
|
610
|
1116
|
human
|
||||||
********
|
********
|
*****
|
*****
|
*****
|
********
|
|||||
S35 Amel Y
|
Sequence13 S35 AmelY vs. Nucleotide
|
|||||||||
NG_008011.1
|
99.86
|
701
|
1290
|
human
|
||||||
********
|
********
|
*****
|
*****
|
*****
|
********
|
|||||
S43 "Amel"
|
Sequence14 S43 “Amel” vs Nucleotide
|
Unknown
|
||||||||
Sequence14 S43 “Amel” vs RefSeq
|
Unknown
|
|||||||||
Sequence14 S43 “Amel” vs Genomes
|
Unknown
|
|||||||||
Sequence14 S43 “Amel” vs TSA
|
Unknown
|
|||||||||
|
Sequence14 S43 “Amel” vs WGS
|
Unknown
|
||||||||
|
Sequence14 S43 “Amel” vs Mouse G + T
|
Unknown
|
||||||||
|
Sequence14 S43 “Amel” vs Human G + T
|
Unknown
|
||||||||
|
Nine additional databases searched.**
|
Unknown
|
||||||||
********
|
********
|
********
|
||||||||
S44 & S43
"Amel"
|
Sequence15 S44 & Sequence16 S43 "Amel" vs
Nucleotide
|
Unknown
|
||||||||
Sequence15 S44 & Sequence16 S43 "Amel" vs RefSeq
|
Unknown
|
|||||||||
Sequence15 S44 & Sequence16 S43 "Amel" vs Genomes
|
Unknown
|
|||||||||
Sequence15 S44 & Sequence16 S43 "Amel" vs TSA
|
Unknown
|
|||||||||
|
Sequence15 S44 & Sequence16 S43 "Amel" vs WGS
|
Unknown
|
||||||||
|
Sequence15 S44 & Sequence16 S43 "Amel" vs Mouse G
+ T
|
Unknown
|
||||||||
|
Sequence15 S44 & Sequence16 S43 "Amel" vs Human G
+ T
|
Unknown
|
||||||||
|
Nine additional databases searched.**
|
Unknown
|
||||||||
********
|
********
|
********
|
||||||||
S44 "Amel"
|
Sequence17 S44 "Amel" vs Nucleotide
|
Unknown
|
||||||||
Sequence17 S44 "Amel" vs TSA
|
Unknown
|
|||||||||
Sequence17 S44 "Amel" vs RefSeq
|
Unknown
|
|||||||||
Sequence17 S44 "Amel" vs Genomes
|
Unknown
|
|||||||||
|
Sequence17 S44 "Amel" vs WGS
|
Unknown
|
||||||||
|
Sequence17 S44 “Amel” vs Mouse G + T
|
Unknown
|
||||||||
|
Sequence17 S44 “Amel” vs Human G + T
|
Unknown
|
||||||||
Table 1. *
The accession renamed 6-13-13 as domestic ferret (Mustela putorius furo), a subspecies of European polecat (Mustela putorius). ** Every
NCBI database was searched. Some have
redundant data. Abbreviations: Query sequence and NCBI database underlined. Nucleotide
= Nucleotide Collection. RefSeq
= Reference Genomic Sequence. TSA = Transcriptome Shotgun Assembly. WGS
= Whole Genome Shotgun –Contigs. Human G+T, Human Genomic plus Transcript. Mouse G
+ T, Mouse Genomic plus Transcript. Genomes,
NCBI Genomes (Chromosome). %ID, percentage of base pairs which
align. Length, number of consecutive base pairs which align. Score,
See [3,6,7] and http://www.ncbi.nlm.nih.gov/BLAST/tutorial/Altschul-1.html. ID,
identity of matching sequence. Unknown, no matches, “No significant
similarity found.” Different from FTS (failure to sequence) in Table 2. Dog,
definitely Canis, but could be wolf
or coyote. “Amel”, no X or Y was reported by Ketchum et al.
Beginning at the top
for Samples 25 & 26 (hair and flesh, respectively, of same sample), we
agree with the SeqWright interpretations of human for the genes MC1R (Melanocortin 1 Receptor) and PNLIP (Pancreatic Lipase), as indicated in their
table near the top of the webpage, also found by Ketchum to be human with
accession numbers identical to ours. The M16 (My16, MYH16: Myosin 16 Heavy Chain) sequence is also human, but had no
interpretation by Ketchum on the website.
Moving down to the next
gene, Amel (Amelogenin) X , SeqWright called the sequence “strange,” but we
found it matched dog (Canis lupus
familiaris) sequences 99.46 %ID for all 186 bases in the Genomic Reference
Sequence (RefSeq), Genomes(chromosome), and Whole Genome Shotgun (WGS) databases,
and nothing in the Nucleotide database., which we suspect was the only database
SeqWright queried. We do not agree with
the mixed human and dog interpretation of Ketchum on the website based on her Nucleotide
Collection (only) alignment of only 73 and 89 bases, respectively, with 88-91
%ID. Accession dates for our best
matches were 2003 and 2006, well before the initiation of the Ketchum et al. study and the SGP website, so
obviously these databases were not searched by her. Ketchum said it was “loosely aligned” with dog.” We
found 99.46 %ID alignment with dog, which, given the hybrid nature of dogs, is very well aligned. IT’S A DOG, wolf or coyote. Curiously, this sequence does not align with
the Amel X sequence (Sequence 10) in Supplementary Data 3 mentioned below, which
is also listed as failure to sequence in the Ketchum paper Table 4 (See Table 2
below). We found no matches for Sequence
10. Dog, FTS, and unknown all for the
same gene? It cannot be a single
species.
Next on the list is
Amel Y Exon 2. Called a “strange
sequence” by SeqWright and “unknown” by Ketchum on the website and in the paper
Table 4, it is anything but. IT’S A BEAR,
not an “unknown” species. The sequence matches polar bear (best) and giant
panda in three databases with 100 and 95.27 %ID, respectively. Again Ketchum and associates came up with
Nucleotide Collection ONLY matches of only 77%ID for Rhesus monkey and 83 %ID
for European polecat(a ferret). The
BLAST™ output screen reproduced on the webpage confirms the database as
Nucleotide Collection: “Database Name
nr” and “Description nucleotide
collection (nt),” and we got the same result of European polecat that Ketchum
did for that database ALONE. No other
searches were mentioned or implied on the website. The primers S1100424.DND-CP29 and S1100424.DND-CP30
from Supplementary Data 12 of Ketchum et
al. for human Amel Y Exon 2 would not amplify the sequence on the webpage,
nor does this sequence match (“No significant alignment found”) the predicted
human amplicon for this gene using these primers. (See below, Table 3.)
In California a wild polar
bear or giant panda is very unlikely.
However, due to their threatened status, the polar bear (Ursus maritimus) and the giant panda (Ailuropoda melanoleuca) have far more
sequence data – complete genomes[8,9]- in the databases than the black bear (Ursus americanus), the previously proven
identity of the sample.[2,3,4] The match to polar bear is better than that to
panda, because the polar and black bears are in the same genus. The sole member of its genus, the panda is a
more distant relative (22-24 MyBP split), and therefore is often called a
“living fossil”.
Throughout our previous
study [3] we found 12 rules to apply to this kind of species
identification. Three are especially
applicable here:
1.
“One cannot match what is not in the database.”
4.
“Good matches to closely related species at these levels (genus, family) may
indicate that the species of interest is relatively under-represented in
the database compared to its kin.”
9.
“Nucleotide, Genome plus Transcription, Reference Sequence and Shotgun Assembly
databases should be searched.”
These
principles must always be kept in mind whenever a search results in no matches
or good but not exact matches (~95-99%ID).
Finally, for the HAR1 (Human Accelerated Region 1)
gene Ketchum on the SGP website shows “No significant similarity found” in the reproduced
BLAST™ output screen and called it a “Completely unknown sequence.” Again, only the Nucleotide Collection was
searched, or at least it was the only output presented on the website. We found matches to polar bear and giant
panda in three OTHER databases with 98.08 and 92.86 %ID, respectively. It matches nothing else; the next closest, but poor, matches are
the Weddell seal 83%ID and the northern Pacific walrus 87%ID; human is 58-th
place on list by score. Interestingly, the scores correlated well
with the known phylogeny of Carnivora,
thereby reinforcing our match to bear. As above, black bear is the more likely
actual origin of the sample. IT’S A
BEAR. If the %ID is not good enough for an exact species match (>99%) and/or a matching species is way out
of its range, such a finding should not be dismissed as “impossible”; here it’s
a clue that a bear is likely, but more data is needed to decide which bear. Similar to the case of Amel Y Exon 2 above,
we could not align the HAR1 sequence from the website with the predicted human amplicon
using the HAR primers in Supplementary Data 12, indeed with anything human, so
we do not believe this sequence could have been amplified by these primers. Its gene is unknown.
It must be mentioned
here that as this paper was nearing completion, new polar bear data was added
to the databases (so indicated in Table 1).
However, the giant panda data were there well before the publication of the
Ketchum et al. paper. Had they searched any one of three additional
databases in Table 1 they would have found a good match to the giant panda for
the Amel Y Exon 2, and HAR1 genes.
This concludes the
discussion of the SGP website data from S25 & S26.
On to the second part
of Table 1; these are sequences from the Ketchum et al. paper, Supplementary Data 3.
S26 & S35 TAP1 (Antigen Peptide Transporter 1) were identical
sequences. We found 100%ID human for 482
bases, which agrees with SeqWright for S26. S39b was 100% ID human as well. There
is no specific mention of S35 or S39b TAP1 in the text of the Ketchum paper or
on the SGP website (yet it’s in Supplementary Data 3).
Table 7 in Ketchum et al. reports “Unknown” for TAP1
sequences of Samples 10, 33, 43, and 44.
We convincingly matched dog (99.18%ID) and human mitochondrial (100%ID),
for S10 and S33, respectively. S33 positions 1-313 align with the minus (reverse) strand of S44,
positions 313 to 1, neither of which found a match. Mitochondrial sequences matched S33 positions
424- 592(end). A mitochondrial sequence should not be primed
and sequenced by TAP1 nuclear gene primers, nor should it be part of a strand
with an “unknown” segment, since the entire human mitochondrial sequence is
known and in the database. Also, a plus
strand matching a minus strand in two samples presumably amplified and
sequenced with the same primers is bizarre. Hence, as discussed below,
multiple, uncontrolled amplification and sequencing must have occurred with S33
and, by implication, possibly S44. S10
and S43 do not align as claimed by Ketchum
et al. Table 7, otherwise S43 would
have matched dog too (Table 1). S43 and
S44 had no matches, which is consistent with Ketchum et al. Table 7 (“Unknown”).
S35 and S37 My16
sequences are identical and match human 100%ID.
The Ketchum et al. paper
states, “…all DNA samples that successfully amplified yielded results
consistent with human ad aligned with the human reference sequence…” However, we are not told which samples those
are.
Curiously, the S26 Amel
X sequence in Supplementary Data 3 (Sequence 10) does not match the S26 Amel X
sequence above on the website (dog), nor does it match any sequence in any of the
databases. No explanation was given in
either location.
S35 Amel X is human, but was listed in Table 4 of the
Ketchum paper as failure to sequence and in Ketchum Table 5 as having X allele
dropout (Y only). S35 Amel Y (unknown
exon) is also human, as were three of its 5 exons in Ketchum et al. Table 4. The other two exons failed to sequence.
Sequence 14 S43 “Amel”,
Sequence 15 S44 “Amel”, Sequence 16 S43 “Amel”, and Sequence 17 S44 “Amel” did
not match any sequence in the
databases. No X, Y or exon designation
is given in Supplementary Data 3.
Curiously, Sequences 15 and 16 are identical, and Sequences 14 and 17
match except for 12 extra bases at the end of 14. These sequences are not adequately labeled. Are some Amel X and others Amel Y?
A summary Table 2 of
all the above results in Table 1 follows.
Other related results from the Ketchum et al. paper (Tables 4 and 7) are added for comparison.
Table 2. Summary of Gene Sequence Identification
|
Sample
|
||||||||
S25 & 26
|
S10
|
S33
|
S35
|
S37
|
S39b
|
S43
|
S44
|
||
Gene
|
Source
|
||||||||
MC1R
|
SGP
|
H
|
|||||||
KP
|
H
|
H
|
H
|
||||||
HSGP
|
H
|
||||||||
|
|||||||||
PNLIP
|
SGP
|
H
|
|||||||
KP
|
|||||||||
HSGP
|
H
|
||||||||
|
|||||||||
M16 (My16)
|
SGP
|
NI
|
|||||||
KP
|
H/a ?
|
H/a ?
|
H/a ?
|
H/a ?
|
H/a ?
|
H/a ?
|
H/a ?
|
H/a ?
|
|
HSGP
|
H
|
H
|
H
|
||||||
|
|||||||||
Amel X
|
SGP
|
SS,UNK
|
|||||||
KP
|
FTS
|
H
|
FTS
|
FTS
|
|||||
HSGP
|
D
|
||||||||
HKP
|
UNK
|
|
|
H
|
|
|
UNK?
|
UNK?
|
|
|
|||||||||
Amel Y*
|
SGP Ex1,Ex2
|
FTS,UNK
|
|||||||
KP Ex1
|
FTS
|
|
FTS
|
FTS
|
|
FTS
|
FTS
|
FTS
|
|
KP Ex2
|
UNK
|
H
|
H
|
H
|
H
|
H
|
|||
KP Ex3(T4,T7)
|
FTS,--
|
--,FTS
|
FTS,FTS
|
FTS,--
|
|
FTS,--
|
FTS,UNK
|
FTS,UNK
|
|
HSGP Ex2
|
PB
|
||||||||
KP Ex4/5
|
H
|
|
H
|
H
|
|
FTS
|
FTS
|
H
|
|
KP Ex8
|
H
|
|
H
|
H
|
|
FTS
|
FTS
|
H
|
|
SGP Ex4/5,Ex8
|
H,H
|
|
|
|
|
|
|
|
|
HKP Ex?
|
|
|
|
H
|
|
|
UNK?
|
UNK?
|
|
|
|||||||||
HAR1
|
SGP
|
SS,UNK
|
|||||||
KP
|
|||||||||
HSGP
|
PB
|
||||||||
|
|||||||||
TAP1
|
SGP
|
||||||||
KP(T7)
|
UNK
|
UNK
|
UNK
|
UNK
|
|||||
HKP
|
H
|
D
|
HM/p
|
H
|
|
H
|
UNK
|
UNK
|
Table 2. *
FTS for Amel Y may be because sample is female.
Abbreviations: SGP, results
on webpage. KP, results in Ketchum
paper. HSGP results of Hart,
sequence from SGP webpage (Table 1
above).
HKP results of Hart,
sequence from Ketchum Paper, Supplementary Data 3(Table 1
above). H, human. NI sequence with No Interpretation. SS, “strange sequence”(SeqWright). H/a ?,
may be among aligned human sequences. UNK, unknown, i.e. no matches. UNK?, X
or Y not specified. FTS, failure to sequence. --,
not listed. Ex1, exon 1. Ex2, exon 2, etc. T4, Table 4 in Ketchum et al. T7,
Table 7 in Ketchum et al. D, dog, definitely Canis but could be wolf or coyote. PB, polar bear. HM/p, human
mitochondrial/partial match. Results of
this study are in aqua
rows. Most remarkable new findings are in
yellow.
So many contradictory
results and unknown, FTS, non-primate species (dog, bear), and erroneous
(mtDNA) sequences make this data set look more like mixed or mistaken sample
provenance or compromised sample handling, sample storage, data handling, or
sequencing protocol than anything new.
Clean, single-species samples should amplify across all genes for the
species. However, if multiple species
are present, attempting to amplify one in low concentration may result in
amplification of other species in the sample, but may not result in sufficient
amplification of the target species, depending on target DNA concentration
relative to other species present, primers, degree of degradation, and
conditions. Such might be the case if
the S26 bear sample were contaminated with dog and human DNA. Partially degraded samples may also be
self-primed by fragments of degradation and backfolding of single strands,
observed by Ketchum et al.[10, 11] The
mixed single strand-double strand electron photomicrograph of S26 DNA in Fig. 12
of Ketchum et al. more likely
represents degradation rather than anything novel and presents just such a morphology
conducive to self-priming in which the double stranded sections act as primers
for the contiguous single strands and/or the backfolded single strands are
self-primed. But the resulting sequences
are not the ones expected from the addition of carefully designed primers; rather, they may be from unexplored
regions of the genome and therefore “unknown” when searched against the
databases. We suspect this to be the
case for the Supplementary Data 3 Sequences 7, 8, 10, 14, 15, 16, and 17 in
Table 1. Even “complete” nuclear genomes
may actually only be 80-95% complete. Most
of the remainder is just the sort of “junk sequence” that may not be in the
databases, because it is not on an important gene. Finally, we are not convinced by the Ketchum et al. conclusion that degradation did
not occur in S26 because a DNA electron photomicrograph of a “degraded human
DNA control sample (Figure 12, panel C),” showed no “single-stranded gaps and
single-stranded ends” as did S26. Simply stated, because the control was not
the same type of sample as S26 and was not exposed to the same conditions for
the same amount of time, such a result is not relevant, and therefore their conclusion
is not valid. Logically, it’s a false
contrapositive, because the original premise that a degraded sample shows no
single DNA strands is not true in every case.
The submitter of S26,
Justin Smeja, used his dog to locate the sample and probably did not take
precautions against contamination by himself or the dog.[12] This is the likely cause of the mixed results
for S25-26 in Table 1. Another
hypothesis might be that the S26 sample is actually the remains of a sasquatch
(possibly one that Smeja shot) devoured (and possibly regurgitated) by a bear
and possibly also a coyote or wild dog. This
hypothesis could be tested on the original S26 sample using better separation
and purification techniques. We did, however, find previously (Table 1 in ref.
[3]) that Ursidae matched
consistently better than any other species (including human and dog) in the database
search of the whole nuclear sequence (called “whole genome” by Ketchum et al.), so we believe that the major
species in S26 is a bear.
To determine whether nonhuman
species would amplify and sequence with the human Amel X and Amel Y primers
used by Ketchum et al., we predicted human amplicons from the primers in
Supplementary Data 12 of Ketchum et al.
and intervening bases and searched these strings with BLAST™ for matches to
other species. The primers were first aligned against human reference sequences
(e.g. NW_001842425.2) of the appropriate chromosome. The extreme base positions at the far end of
each primer (5’ on forward, 3’ on reverse) then defined the amplicon, which in
every case was on the correct chromosome and matched the length listed in
Supplementary Data 12. The string of
bases between these extreme positions was then searched against the Reference
Genomic Sequences Database. See Table 3.
As a check we also used Primer-BLAST™, and obtained the same results. Amel X
produced identical results for chimpanzee (Pan
troglodytes) and pygmy chimpanzee (Pan
paniscus); these should amplify and sequence. Gorilla (Gorilla
gorilla gorilla) and the northern white-cheeked gibbon (Nomascus leucogenys) are questionable
with four total primer mutations. In any
case, a human match for Amel X indicates a species more recent than any of the
great apes. Similarly, only the chimpanzee (Pan
troglodytes) aligned the primers at the proper locations and produced an
amplicon of the correct length (Supplementary Data 12) on the correct gene for
Amel Y exons 1, 2, 4/5, and 8. Hence, any primate between chimpanzee and human
on the Evolutionary Tree of Life would be amplified and sequenced at exons 1,
2, 4/5, and 8, and no other, more
distant, species of primate or nonprimate would be amplified and sequenced
with these primers. Therefore, we are
assured that FTS for any of these four exons in Table 2 above cannot be due to
an unknown primate or human hybrid more recent than the chimpanzee; they must signal
a more distant species, primate or nonprimate. UNK (unknown sequence) is
addressed below. Conversely, a “human” match (“H” in Table 2)
to an amplicon from these four pairs of primers can only be a human or some
human-like primate more recent than the chimpanzee, nothing else. Not even the gorilla, the pygmy chimpanzee,
gibbons, or the orangutan (Pongo abelii)
would align or sequence with these four primer pairs. (too many mutations vs.
the primers. Interestingly, the green
monkey (Chlorocebus sabaeus) matches
these Amel Y amplicons well but on the X chromosome. No other monkeys or
nonprimates come close to matching any of these human Amel X and Amel Y
amplicons. Table 3 shows search results which demonstrate these points.
Conclusions
It is obvious that
Ketchum et al. did not observe basic
principles of DNA database search/match as outlined in [3], and they missed
some matches that were in databases other than the Nucleotide Collection. These limited searches resulted in incorrect
or nebulous species identifications and false conclusions. We found this previously.[3] Further, the conflicting results for
different genes of the same sample (even three different species for S25/S26)
raises questions about sample provenance and handling. Such mixed results DO NOT imply a new
species, as suggested by Ketchum on the SGP website. Any new primate species or human/primate
hybrid must match human and/or some primate above 95%ID. As an example, we matched polar bear (best)
and giant panda when there were no black bear data. Also, both we and Ketchum matched European polecat/domestic
ferret (only 83%ID however) when only the Nucleotide Collection was
searched. It’s a carnivore distantly
related to bears. Genetic hybrids seem
unlikely, because they should at least resemble some one or more primates or
amplify and sequence if they are more recent than the chimpanzee. However, everything above considered, S33, S35
and S37 might actually be worth pursuing.
They come close to human, though they have some FTS. S35 has a pristine mtDNA haplogroup H10e with
no extra mutations.[5] S37 has little gene
data but also appears to be human with haplogroup H3 and only two extra
mutations.[5] At the other end S43 and
S44 produced few matches. Also, S44 had
17 extra mtDNA mutations, the worst of 18 samples and with only one chance in 1,606,186,760 of being matrilinearly
related to modern humans.[5]
Similarly, S39b had 12 mtDNA extra mutations with a probability of one
chance in 162,224 of being from the human population.[5] Samples S10, S25/26, 39b, S43 and S44 have little
promise as bigfoot/sasquatch candidates.
Lessons learned from
this study are:
Search
all databases.
If
unsuccessful, look for members of the same genus or family.
Failing
these, the unknown sequence must be VERY different from ANY whole genome
sequence AND be from an unexplored region of an incompletely sequenced genome
(e.g. an uncommon or unimportant gene) or from an unsequenced genome, OR there
must be something wrong with the data, which could be a “junk sequence” not
previously identified and not amplified by the primers.
The
phylogeny of the planet’s life is sufficiently well known, and the species are
sufficiently related to one another through the Evolutionary Tree of Life, and
their DNA is sufficiently sequenced that a totally new form of life with NO match
to ANY existing species is very unlikely at this time, especially when
common/important genes are “sequenced.” Certainly
such a form is not likely to be a primate (all of which are closely related) of
the most studied mammalian order. Therefore,
purported primate sequences which yield no matches must be considered to
be very likely based on bad or junk data.
When samples fail to align (FTA, FTS), either the wrong primers were
used, PCR conditions were inappropriate, or there was insufficient target
DNA in the sample. When samples align
and sequence, especially with primers for important genes, but do not match any
known species even remotely, the sequences are suspect and may be from degraded
or intractable mixtures. Claims of a new
primate species under either of these circumstances – FTA/FTS or unknown sequence
- are totally unwarranted.
When we began our
studies a year and a half ago, we had high hopes for the discovery of a new
hominid species. It was the natural
reaction of a lifelong amateur naturalist.
Now, after three exhaustive studies[3,5] of the Ketchum et al. results and conclusions[1], we
are left with the disappointing overarching conclusion that the existence of
bigfoot or sasquatch was not proven by them.
This does not mean that bigfoot/sasquatch does not exist, that the
tireless efforts of many field workers are wasted effort, that laboratory
protocols are invalid, or that all the reports of many observers are inaccurate. Hopefully, application of our methodology
will avoid future mistakes in interpretation if good DNA data is eventually
obtained.
Note
in Passing
An early version of Table
1 was shared privately with Dr. Ketchum soon after the posting of the gene
sequences on the SGP webpage, with the hope that admitting her mistakes might
avoid more controversy and similar mistakes in the future. No response was received publicly or privately,
so this author had no responsible scientific alternative but to make these
mistakes public in this paper.
Conflict
of Interest
The author declares no
conflict of interest.
Acknowledgement
Thanks go to the Sasquatch Genome Project for making their sequences available online. The author received no financial support for this work.
Thanks go to the Sasquatch Genome Project for making their sequences available online. The author received no financial support for this work.
References
[1]
Ketchum, M. S. et al. Novel North American Hominins: Next Generation Sequencing of
Three Whole Genomes and Associated Studies. DeNovo,
2013, 1:1, Online only: http://sasquatchgenomeproject.org/view-dna-study/
[2] Khan, T.; White, B. Final
Report on the Analysis of Samples Submitted by Tyler Huggins, Wildlife Forensic
DNA Laboratory Case File 12-019; Trent University Oshawa: Peterborough,
Ontario, Canada, 2012.
[3] Hart, H. V. Methodology and
New Metrics for Distinguishing Related Species from Incomplete nuDNA.
Unpublished. http://bigfootforums.com/index.php/topic/40487-the-ketchum-report-part-3/page-30?hl=ketchum#entry837515
[4] Sykes, B. C.; Rhettman A.; Mullis, R. A.; Hagenmuller, C.; Melton, T. W.; Sartori, M. Genetic Analysis of Hair Samples Attributed to Yeti, Bigfoot and Other Anomalous
Primates. Proc. R. Soc. B, 2014,
281, 20140161.
[5] Hart, H. V. “But the mtDNA Sequences are all Human…” Really?
https://www.facebook.com/groups/smartbigfoot/ and
http://bigfootforums.com/index.php/topic/40487-the-ketchum-report-part-3/page-31
[6] Altschul, S. F.; Gish, W.; Webb, M.;
Meyers, E. W.; Lipman, D. J. Basic Local
Alignment Search Tool. J. Mol. Biol., 1990, 215 (3), 403-410.
[7] Madden, T. The BLAST Sequence Analysis
Tool, In The NCBI Handbook; McEntyre
J; Ostell J., Eds.; National Center for Biotechnology Information: Bethesda,
MD, 2003; http://www.ncbi.nlm.nih.gov/books/NBK21097/.
[8]
Li, R., et al. The Sequence and De Novo Assembly of the Giant Panda
genome. Nature, 2010, 463, 311-317.
[9] Liu, S., et al. Population Genomics Reveal Recent Speciation and Rapid Evolutionary
Adaptation in Polar Bears. Cell, 2014, 157 (4), 785-794.
[10] Levin,
H. L. A Novel Mechanism of Self-primed Reverse Transcription
Defines a New Family of Retroelements. Mol. Cell. Biol.
1995, 15(6), 3310-3317.
[11] Whitcombe,
D. M., Theaker,
J., Gibson,
N. J., Little, S. Methods
for detecting target nucleic acid sequences.
US Patent 6326145.
2001.
[12] Greene, M. D., Sasquatch for Sale: Death,
DNA, and Duplicity, San Bernardino, CA, 2014, p. 232.
No comments:
Post a Comment