“But the Sasquatch Mitochondrial
DNA is all
Human…” Really?
By
Haskell V. Hart
Copyright(c) Haskell V. Hart all rights reserved. May be reproduced for personal, noncommercial use only.
Abstract
While the nuclear DNA
of three samples in the Ketchum et al.
sasquatch study has been shown to be from a bear, a human, and a dog, many
supporters of this study point to the result that the mtDNA is human, which
indicates a human hybrid sasquatch. In
this paper the published mutations of all 29 mtDNA samples in the Ketchum study
were examined in detail and compared to the Poisson Distribution of mutations. Although most samples were within the normal
range of number of private (extra) mutations (≤6) from their respective
haplogroups (and had probabilities ≥ 2.3%), eight of the 18 samples with
complete mtDNA sequences exceeded six mutations (<1% probability). Of the other 11 samples with HVR-1 only
results, eight of these had an extra mutation. Possible reasons for this are discussed. It was determined that nonhuman samples
cannot produce human-like mtDNA results by submission of blind samples of cat,
dog, and horse mtDNA (plus two human controls) to a reputable commercial
laboratory.
Introduction
In February, 2013,
Ketchum et al.[1] published a paper
that reported sequencing three samples (S26, S31, and S140) of nuDNA and 29
samples of mtDNA, with the conclusion that these were from a hybrid of a
previously unknown primate male and a modern human female. Subsequently the three nuDNA samples were
shown to be from a black bear, a human, and a dog, respectively.[2,3(Samples
25104, 25106),4] The Ketchum claim also relied on the results
of the mtDNA sequencing, which for all 29 samples were reported to be
human. Because of this puzzling
contradiction for S26 and S140, all the reported mtDNA mutations in Ketchum et al.[1] “Supplementary Data 2” have been examined in detail for
anomalies. There were many.
Methods
and Materials
1. Computer
The mtDNA mutations in
the Ketchum et al.[1] Supplementary Data
2 were found to be based on rCRS (revised
Cambridge Reference Sequence),
which is haplogroup H2a2a1. There is no
explanation in text or caption to indicate this or that 11 samples were HVR-1
(a.k.a. HVS-1 or HV-1) only and 18 samples were from complete mtDNA sequencing.
Throughout the table, there were some mutations without a suffix (A, G, T, or
C). These were counted as having the appropriate
suffix for a transition based on the accepted human mtDNA Phylotree Build 16
(Feburuary 19, 2014) prefix,[5] and assuming only A→G, G→A, T→C, C→T (no
transversions). The 18 sets of complete mtDNA
mutations were entered as input to the program FASTmtDNA to convert them from rCRS to RSRS (Reconstructed Sapiens Reference Sequence).[6] This system references
all mutations from the MCRA (Most Recent Common Ancestor) of all
humans, “Mitochondrial Eve”. While
either reference system could have been used, the choice of RSRS allows the use
of the program mtDNAble, which uses
the output of FASTmtDNA to determine
the haplogroup and the RSRS based mutations (output in a Microsoft Excel™ file).
Both programs are available free through
mtDNA Community, an outgrowth of [6], and can be downloaded free from its
website: http://www.mtdnacommunity.org/downloads.aspx.
All searches and
alignment comparisons were performed with BLAST™
search/match software[7, 8] against
the GenBank Nucleotide Database on the
National Center for Bioinformation (NCBI) website: http://www.ncbi.nlm.nih.gov/
. The use of this database and search/match
software is free to the general public, so that every result reported here can
be verified. Such is not the case in
Ketchum et al.[1]
2. Statistical
Random, extra or “private”
DNA mutations are known to follow the statistical Poisson Distribution, for
example [9]. Such mutations do not
define the haplogroup and are considered “extra” in that context. The Poisson Distribution describes any
process which involves multiple occurrences, each with a fixed probability per
unit length (such as a DNA sequence), unit area, unit volume, or unit time. A
common example is the number of telephone calls per hour to a service
center. In our case the probability of a
particular human sample having X extra mutations in the complete mitochondrial
genome is
where
, and λ equals the
average value of X over many independent trials, here different humans. This relationship will be applied to
mutations in NCBI database entries (haplogroup H1a), and the resulting value of
λ will be used to calculate probabilities of occurrence for the number of mutations
in the Ketchum Supplementary Data 2 samples with complete mitochondrial sequences. The likelihood of these samples being from
the known human population will be calculated from Equation (1) as Pr(X=k) for samples with k mutations.
A more sophisticated
statistical model is the negative binomial distribution, in which λ is allowed
to vary according to the different probabilities of a mutation at different
sites.[10] This model requires much
larger data sets (high hundreds to thousands of samples) to be justified.
All statistical
calculations and graphics were done with Microsoft Excel™.
3.
Biological
Blind buccal swab
samples of cat (Felis catus), dog (Canus lupus familiaris) and horse (Equus caballus) and two each from two
unrelated humans (one male, one female) as controls were submitted to a
reputable DNA sequencing laboratory as human samples according to their written
protocols, which required that duplicate samples be submitted. Results were returned through their
website.
Results
and Discussion
Results of the biological sample submissions were as
follows:
F.
catus: Failure
to Sequence.
C.
lupus fam.: Failure
to Sequence.
E.
caballus: Failure to Sequence.
Control 1 (male) 1) HVR-1 and HVR-2: Clade H; 2) Complete sequence: Haplogroup H27.
Control 2 (female) 1) HVR-1 and HVR-2: Clade W; 2) HVR-1 and HVR-2: Clade W.
In the case of each
human control, the reported HVR-1 and HVR-2 mutations were identical for submissions
1) and 2). From this data it is seen
that the primers used by this laboratory are specific to human mtDNA and will
not amplify nonhuman mammalian mtDNA.
Also, the laboratory’s sequencing is reproducible. Whole genome mutations from Control 1,
submission 2) were inputted to FASTmtDNA,
and those results to mtDNAble, which
computed the haplogroup H27 as a check, the same haplogroup determined by the
commercial laboratory (above).
Initial focus was on
S26 (the bear with haplogroup H1a). The NCBI Nucleotide database was queried
for entries with “H1a” in the title or the “haplogroup = “ fields.
Thirty-five were found.
Insertions at positions 309, 515- 522 are discounted as well as any
mutations at 16182, 16183, 16193, or 16519, as these are very common and not
used in the mtDNA Phylotree Build 16 construction of van Oven[5 ], which was
used to verify every extra mutation.
Also, missing topological mutations were considered reverse mutations
and therefore included in the number of extra mutations. The average number of mutations in this set
was 2.3714, which was used as the value of λ, from which the Poisson Distribution
of mutations (n=35) was calculated by Equation (1) for comparison. These results are tabulated in Table 1 and graphed
in FIG. 1. From these it is seen that S26 is an outlier with
probability of occurring of only 0.000000004, or 1 chance in 224,056,304 of
being in the known human population. The
statistical comparison in FIG. 1 is very similar to other applications of the
Poisson Distribution to human mtDNA mutations[9], but is tighter (lower value
of λ) because only one haplogroup is involved, whereas [9] involved entire
geographical populations in each statistical analysis. Divergence causes λ to
increase.
Table 1. Poisson
Distribution for H1a
| ||||
No. of Mutations(k)a
|
No. of Samplesa
|
Pr(X=k) b
|
One chance
Inc
|
35 *Pr(X=k)d
|
0
|
3
|
0.093347278
|
11
|
3.3
|
1
|
7
|
0.221366401
|
4.5
|
7.7
|
2
|
9
|
0.262477305
|
3.8
|
9.2
|
3
|
10
|
0.20748206
|
4.8
|
7.3
|
4
|
3
|
0.123007221
|
8.1
|
4.3
|
5
|
2
|
0.058340568
|
17
|
2.0
|
6
|
1
|
0.023058415
|
43
|
0.81
|
7
|
0
|
0.007811626
|
128
|
0.27
|
8
|
0
|
0.002315589
|
432
|
0.081
|
9
|
0
|
0.000610139
|
1,639
|
0.021
|
10
|
0
|
0.00014469
|
6,911
|
0.005
|
11
|
0
|
3.1193E-05
|
32,059
|
0.001
|
12
|
0
|
6.16432E-06
|
162,224
|
0.0002
|
13
|
0
|
1.12448E-06
|
889,299
|
0.00004
|
14
|
0
|
1.90473E-07
|
5,250,081
|
0.000007
|
15
|
0
|
3.01129E-08
|
33,208,345
|
0.000001
|
16
|
0
|
4.46316E-09
|
224,056,304
|
0.0000002
|
17
|
0
|
6.22593E-10
|
1,606,186,760
|
0.00000002
|
__________
|
__________
| |||
Sum
|
35
|
1.000000000
|
35.00000000
|
a
Corrected
to discount common mutations not used to construct Phylotree Build 16[5], see
text. Blue bar graph in FIG. 1.
b
From Equation (1) with λ = 2.3714, the average k for 35 H1a samples, see text.
c
= 1/ Pr(X=k)
d
The expected value for k mutations in a set of 35 samples, rounded off to
conserve space. Red bar graph in FIG. 1.
Sums reflect full accuracy of each entry.
FIG. 1.
Distribution of mutations from H1a in Phylotree[5]: 35 H1a samples
(blue) and the corresponding Poisson Distribution (n=35) with the same value of
λ = 2.3714, as calculated from Equation (1) (red). Above eight mutations the probabilities are
so low that the bars are not plottable or observable on this scale (see Table
1).
Table 2 presents
mutation results for all 18 samples with a complete mitochondrial genome
sequence. Haplogroups and raw rCRS based
mutations were taken from Ketchum et al.[1]
Supplementary Data 2 and the latter converted to a haplogroup and a RSRS based
list of mutations. Numbers of mutations
were corrected as described above for H1a.
Most haplogroup differences from Ketchum to mtDNAble were minor. Major differences
for S26, S39b, and S44 were due to large numbers of extra mutations. These samples did not fit into the
established human mitochondrial phylogenetic tree of mutations at www.phylotree.org.[5]
as discussed below. Assuming the same
Poisson Distribution for H1a in Table 1, eight samples had mutations with
probabilities less than 0.01 (less than1% chance, or 1 in over 100), most of these
considerably less. As a cross check, all
samples were aligned against the NCBI Nucleotide database, which has 27,156
complete human mitochondrial sequences as of July 20, 2014. The same eight samples had poor matches
against the database, i.e.
unacceptably high mismatches and gaps.
Note that although the %ID for these eight samples vs. their best
matches was greater than 99.9%, the human mitochondrial genome is so well
defined that 0.1% deviation from established haplogroups is an outlier. In fact, the results for H1a database samples
in Table 1 suggest that greater than 6 / 16,568 (0.036%) extra mutations
constitute an outlier worthy of reexamination (less than 1% probability). An
additional 20 H1a samples found on the mtDNA Community website: http://www.mtdnacommunity.org/downloads.aspx NCBI samples utilized by "mtDNA Community" also had only 1-3 extra mutations each, consistent with the disributions in
Table 1 and FIG. 1.
A
survey of T2b sequences in the Nucleotide Database confirmed that S2 is an
outlier, while S1, S12, S36, ES-1 and ES-2 are within the normal range of extra
mutations for T2b. Similarly, S29, S44,
S46, and S138 are outside the normal range of extra mutations for H2a2. In both cases distributions of extra mutations
were similar to H1a, confirming that the H1a statistics in Table 1 and FIG. 1
are valid enough for other haplogroups to identify outliers as in Table 2. Much greater numbers of samples might reveal
slightly different distributions (λ) for each haplogroup, however. Such an analysis is beyond the scope of this
study.
Table 2. Samples with Complete mtDNA Sequences
|
||||||
Sample IDa
|
Haplogroup
|
No. of
|
Poisson
|
One Chance
|
||
Ketchum
|
mtDNAble
|
Mutations, kb
|
Pr(X=k)c
|
Ind
|
Best Matche
|
|
1
|
T2b
|
T2b
|
2
|
0.262
|
3.8
|
3,0 (JX153739.1)
|
2
|
T2b
|
T2b
|
13
|
0.000001
|
889,299
|
14,0 (JX153739.1)
|
4
|
H3
|
H3
|
4
|
0.123
|
8.1
|
3,1 (JX153639.1)
|
11
|
A6L2c
|
L2c3
|
4
|
0.123
|
8.1
|
9,0 (DQ304989.1)
|
12
|
T2b
|
T2b
|
6
|
0.023
|
43
|
8,0 (JX297190.1)
|
24
|
H1s
|
H1ba
|
3
|
0.207
|
4.8
|
4,0 (JQ702799.1)
|
26
|
H1a
|
H5e
|
16
|
0.000000004
|
224,056,304
|
16,0 (JX153188.1)
|
28
|
H1
|
H1ba
|
8
|
0.002
|
432
|
7,1 (KF161997.1)
|
29
|
H2a2
|
H2a2
|
7
|
0.008
|
128
|
8,0 (JX153451.1)
|
31
|
L0d2a
|
L0d2a1
|
4
|
0.123
|
8.1
|
5,0 (KC346174.1)
|
35
|
H10
|
H10e
|
0
|
0.093
|
11
|
0,0 (KC257308.1)
|
36
|
T2b
|
T2b
|
3
|
0.207
|
4.8
|
5,0 (JX297190.1)
|
37
|
H3
|
H3
|
2
|
0.262
|
3.8
|
2,0 (JX153533.1)
|
38
|
V2
|
V2c
|
1
|
0.221
|
4.5
|
0,0 (JQ705254.1)
|
39b
|
T2
|
R2'JT
|
12
|
0.000006
|
162,224
|
17,0 (KJ690074.1)
|
44
|
H2a2
|
T2
|
17
|
0.0000000006
|
1,606,186,760
|
15,0 (JQ703290.1)
|
46
|
H2a2
|
H2a2
|
12
|
0.000006
|
162,224
|
13,0 (JX153451.1)
|
138
|
H2a2
|
H2a2
|
7
|
0.008
|
128
|
8,0 (JX153451.1)
|
C-3
|
HV
|
HV
|
4
|
0.207
|
4.8
|
4,0 (KC765916.1)
|
ES-1f
|
none
|
T2b
|
1
|
0.221
|
4.5
|
2,0 (JN106403.1)
|
ES-2f
|
none
|
T2b
|
2
|
0.262
|
3.8
|
3,0 (JX153739.1)
|
Outlier,
<1% probability of being in the normal human population. Pr(X=k) <0.01, or One Chance In >100.
a
From Ketchum et al.[1] Table 1 and
Supplementary Data 2.
b
Corrected to discount common mutations not used to construct Phylotree Build
16[5], see text.
c
Calculated from Equation (1) with λ = 2.3714, same as in Table 1. Rounded off to fit the table.
d
= 1/Pr(X=k) as in Table 1.
e
Best match in Nucleotide database as: mismatches, gaps (Accession Number).
f
Extra Sample not contained in Ketchum et
al.[1] Table 1 or haplogrouped in Supplementary Data 2. Possible controls (?).
The existence of an
evolutionary phylogenic tree requires that mutation occur in a continuous
fashion down each branch of the tree only. Cross overs between branches are not
allowed. Such a pattern of evolution
would resemble a maze or a lattice, not a tree.
Yet this is what samples such as S26, S39b and S44 would require to be
human, since they don’t match any one haplogroup very well. Supplementary FIG. 1 demonstrates this point
for S26. S39b and S44 present similar
phylogenetic dilemmas.
The Phylotree of mtDNA
was built on over 20,000 human samples.
As new data is added, radically new deep-rooted branches do not occur at
this advanced stage of Phylotree (Build 16).
New human mtDNA phylogeny is nearly always added near the tips of the
existing branches. It would take a lot
more samples to reconcile S26, S39b and S44 with any future build of the
Phylotree. Such is not likely to
occur. It may even be impossible.
The 11 samples of HVR-1
(only) mutations were not analyzed statistically because of the many examples
of ambiguity involved in this screening technique, primarily due to homoplasty.[11]
For this reason, DNA laboratories
usually report at least HVR-1 AND HVR-2 mutations in deciding a haplogroup,
actually only the clade, as for example in the case of the biological control
samples above. A complete haplogroup
requires a complete sequence. In our
case, eight samples have an extra mutation, and one is misgrouped but otherwise
normal (S33 should be U5). Three of the
eight (S71, S117, and S118) have equally likely alternate haplogroups (L3d or
L3e4) all according to [6], Supplementary Table S2. These three present the
same phylogenetic conundrum as S26 mentioned above (SUPP. FIG. 1). There could
be other extra mutations in the HVR-2 and coding regions which were not
determined by this limited laboratory analysis.
Thus, only three of the 11 should be considered as phylogenetically human
by this very limited HVR-1 criterion.
Conclusions
Based on a limited
study of cat, dog, horse, and human samples, nonhuman samples do not amplify or
sequence with human mtDNA primers. Consequently,
samples that analyze for human mtDNA and animal nuDNA MUST BE MIXTURES of the species,
e. g. S26 and S140. The 1 – 10k X per
cell number advantage of mtDNA over nuDNA allows a much smaller number of human
cells (0.01 to 0.1% of total) to show up in mtDNA analysis; and if only human
primers are used, the results can be misleading. Over a year ago, the author recommended to
Dr. Ketchum than nonhuman primers (such as bear and dog) be used on some of the
ambiguous samples. No response was
received. If only human primers are
used, only human mtDNA will be sequenced.
A sea of nonhuman mtDNA would be missed.
The same applies to nuDNA at specific gene loci. This subject will be addressed in a future
paper.
Samples 2, 26, 28, 29,
39b, 44, 46, and 138 are not human as defined by the accepted Phylotree for
human mtDNA (all with less than 1% probability of occurring, most much less). However, they do not match any other animal
nearly as closely as they match human (top 500 matches were all human in every
case).
Of the 11 HVR-1-only
samples eight have one extra mutation.
Another one was misgrouped but normal, which leaves a total of three
that are potentially human as far as the limited HVR-1 information can
tell. However, the remarkable claim of
sasquatch mtDNA demands a full sequence for credibility.
The two most likely
causes for these anomalies are:
(1) Sequencing errors, due to small amounts of
sample and/or contamination, and/or degradation.
(2) Primate male/human female hybridization events,
as suggested by Ketchum et al., in
the sufficiently distant past that additional mtDNA mutations have since
occurred along nonhuman
evolutionary lines. Keep in mind that Denisovan
mtDNA matches human 99.7%. Neanderthal
matches human 98.9%. Denisovan matches
Neanderthal 99.7%. So the purported sasquatch
hybridization events resulting in present day 0.036 – 0.1% mismatches for 18
samples (99.96-99.9% agreement) would had to have occurred more recently than the
divergence of these two subspecies (assuming equal rates of mutation
throughout).
Sample 26 remains an
enigma. Bear nuDNA cannot go with nearly
human mtDNA. Human and bear mtDNA align
only about 75%, and the species have different numbers of chromosomes (46 and
74, respectively). We are left with (1)
as the cause for mtDNA anomalies in this sample.
Sample 140 (dog nuDNA)
has an anomalous mtDNA mutation, 16176?, in the HVR-1 region, which might also
be due to (1). Dog nuDNA cannot go with
human mtDNA either (78 vs. 46 chromosomes, respectively).
Sample 31 (human nuDNA)
does indeed have an acceptable number of mtDNA mutations (4) for a human
sample. This sample IS human by ALL DNA
measures. Could it also be a
sasquatch? Only if sasquatch are feral
humans. In that case, why do so many of the mtDNA samples fail the test, i.e. fall outside the range of human
mutations, as shown above? Are more than
one species or subspecies involved? Are
hybridization events sufficiently spaced out in time to allow greatly different
numbers of subsequent mutations (as in Table 2)? Could these be along nonhuman evolutionary
lines? Answers to these and other such questions
require much more data, good data from controlled samples.
There is enough
information in this paper for anyone to validate every single result.
Future work will
address the specific gene sequences in Ketchum et al.[1], Supplementary Data 3, and on the Sasquatch Genome
Project website: http://sasquatchgenomeproject.org/. These too are not all what they were reported
to be.
Conflict
of Interest
The author declares no conflicting
interests.
Acknowledgement
Thanks go to the
Sasquatch Genome Project for sharing their data online. No financial support was received for this
work.
References
[1] Ketchum,
M. S. et al. Novel North American
Hominins: Next Generation Sequencing of Three Whole Genomes and Associated
Studies. DeNovo, 2013, 1:1, Online only: http://sasquatchgenomeproject.org/view-dna-study/
[2] Khan, T.; White, B. Final
Report on the Analysis of Samples Submitted by Tyler Huggins, Wildlife Forensic
DNA Laboratory Case File 12-019; Trent University Oshawa: Peterborough,
Ontario, Canada, 2012. http://www.bigfootbuzz.net/bart-cutino-tyler-huggins-release-sierra-kills-sample-dna-results/
[3] Sykes,
B. C.; Rhettman A.; Mullis, R. A.; Hagenmuller, C.; Melton, T. W.; Sartori, M. Genetic Analysis of Hair Samples Attributed to Yeti, Bigfoot and Other Anomalous
Primates. Proc. R. Soc. B, 2014,
281, 20140161.
[4] Hart, H. V. Methodology and
New Metrics for Distinguishing Related Species from Incomplete nuDNA. Unpublished.
[5] van Oven, M. Revision of the mtDNA Tree and Corresponding Haplogroup
Nomenclature. Proc. Natl. Acad. Sci. USA,
2010, 107(11), E38-E39. http://dx.doi.org/10.1073/pnas.0915120107
[6]
Behar D.M.; van Oven, M.;
Rosset, S.; Metspalu, M.; Loogväli, E.-L.; Silva, N. M.; Kivisild, T.; Torroni,
A.; Villems, R. A “Copernican" Reassessment
of the Human Mitochondrial DNA Tree from Its Root. Am. J.
Hum. Genet., 2012, 90(4), 675-684. http://dx.doi.org/10.1016/j.ajhg.2012.03.002
[7] Altschul,
S. F.; Gish, W.; Webb, M.; Meyers, E. W.; Lipman, D. J. Basic Local Alignment Search Tool. J. Mol.
Biol., 1990, 215 (3), 403-410.
[8] Madden, T. The
BLAST Sequence Analysis Tool, In The NCBI
Handbook; McEntyre J; Ostell J., Eds.; National Center for Biotechnology
Information: Bethesda, MD, 2003; http://www.ncbi.nlm.nih.gov/books/NBK21097/.
[9] Di
Rienzo, A.; Wilson, A.C. Branching Pattern
in the Evolutionary Tree for Human Mitochondrial DNA. Proc.
Nat. Acad. Sci. USA, 1991, 88, 1597-1601.
[10] Tamura, K; Masatosi, N. Estimation of the Number of Nucleotide
Substitutions in the Control Region of Mitochondrial DNA in Humans and
Chimpanzees. Mol. Biol. Evol., 1993, 10 (3), 512-526.
[11] Behar, D. M.; Rosset,
S.; Blue-Smith, J.; Balanovsky, O.; Tzur, S.; Comas, D.; Mitchell, R. J.; Quintana-Murci,
L.; Tyler-Smith, C.; Wells, R. S. The Genographic Project Public
Participation Mitochondrial DNA Database. PLOS
Genet., 2007, 3(9), e169.
SUPP.
FIG. 1. Alternate haplogroups for S26: haplogroup H1a
with evolution along (1) in red, and haplogroup H5e,
with evolution along (2) in blue, are equally likely, but crossovers to include
all six mutations are not allowed. In either case, there are left three more extra mutations, for a total of 16 as
seen in Table 2. Black segments are schematic
and incomplete. For complete subbranches
see [5, 6]. Directions (1A) and (2A) represent hypothetical
(?) sasquatch (nonhuman) evolution paths SINCE a hybridization event
involving a H1a human female or a H5e human female, respectively. Each path would have 16 more mutations (not listed
above) to reach S26, which would include the three mutations shown above on the
red or blue path to the other haplogroup.
Similar phylogenetic conundrums exist for S39b and S44 also (see Table
2).
No comments:
Post a Comment