Pages

Saturday, September 7, 2019

Sample 26 mtDNA Electropherograms Show Degradation/Contamination




Figure 1.  Ketchum S26 Electropherograms from SGP website.  Green bars above each peak proportional to quality of read.  Peak color indicates base:  A T  C.  For research and education only, "Fair Use." 

Melba Ketchum has consistently claimed in her paper [1], on radio, on TV, on the Sasquatch Genome Project (SGP) website [2], on YouTube [3],  and at every other opportunity,  that her samples are not contaminated or degraded, in spite of very anomalous results, many of which are easily explained by such.  She posted Sample 26 (the Smeja find) raw data mtDNA electropherogram files on the 
SGP website.  I downloaded all 272 such files, which require special software to open, examine and assemble into a complete mtDNA sequence.  I purchased DNABaser software from Heracle Biosoft to do just that.  It cost $159 and runs on my PC.  The above Fig. 1 is a screen grab of  six randomly selected, representative electropherograms of the 272.

This data was obtained by the Sanger Dideoxy Sequencing Method, outlined in Fig. 2.


Figure 2.  Sanger Dideoxy Method.  After Wikipedia.  For research and education only, "Fair Use."   

In this method, segments are elongated (as complementary strands) and terminated with "fluorochrom" (i.e. fluoroprobe, or fluorescent tag) nucleotides with different fluorescent wavelengths for each of the four nucleotides (A, G, T, C).  The successive elongated segments are separated by electrophoresis according to their length, then detected and identified by their characteristic fluorescent wavelength (stimulated by a laser).  Software makes the plots as represented schematically in Fig. 2 lower right or actually in Fig. 1 and generates a sequence (Fig. 1 below each electropherogram, Fig. 2 lower right).

All but the shortest (few hundred bp) sequences must be sequenced in segments, and then assembled by computer, based on overlaps, as shown in Fig. 3.

Figure 3.  Assembly.  Overlaps underlined.  After WikipediaFor research and education only, "Fair Use."


Mitochondrial DNA assembly is handily done on a PC.  Only very large data sets (e.g. whole nuclear genomes) from next generation sequencing, require a "super computer" or a network of linked servers to assemble.

Sanger sequencing and subsequent assembly does not work for severely degraded DNA, whether from a single or multiple sources.  This is because of the presence of sequence fragments which are self-primed and which are never the less elongated and tagged with a fluorescent probe.  These fragments coelute with the fragments of interest as shown in Fig. 1 ("Poor").  The "Excellent" electropherogram has no such coelution:  each peak stands alone and is separated from its neighbors so that the fluorescent detector can make an unambiguous base call (A, G, T, or C). The "Good" electropherogram can also be used because coeluting peaks are much smaller in most cases and can be ignored.  The few ambiguous, double peaks will hopefully be clarified by another overlapping sequence fragment.  Otherwise these few remain unknown bases (or possibly polymorphisms).  The "Poor" electropherograms are totally uninterpretable.  They have multiple overlapping peaks of comparable intensity.  Those "?" electropherograms might have some salvageable sequence runs, but should be considered dubious at best.   So, two usable e-grams out of these six.  A well collected and preserved single species DNA sample would produce only usable e-grams.

Unfortunately, the 272 electropherograms contained only 47 that could be assembled into a sequence of 5260 bp out of an expected 16,568 bp human mitochondrial genome.  The remaining 225 electropherograms contained only 11 which could be assembled into a 1923 bp sequence.  Other assemblies were shorter still.

The first assembly (47 e-grams) aligned with the published S26 mtDNA sequence [4], base positions 11401-16551, only 96% with 74 gaps.   The second best assembly (11 e-grams) aligned with published [4] S26 base positions 13718 - 15435 only 97% with 27 gaps.  This is rather poor agreement between these raw data and the published S26 sequence [4].   The 47 e-gram sequence aligned with the 11 e-gram sequence only 93% with 1643 identities and 69 gaps.

Thus, only 32% of the mtDNA genome could be sequenced with these data, and the sequence agreed very poorly with the published S26 sequence produced by Family Tree DNA [4].  FTDNA assumes a human sequence and uses human primers, so DNA of any other species present would not be sequenced. 

These data call into question the mtDNA sequencing and haplogroup designation of Sample 26, and hence also the Ketchum claim that this sample represents a human-unknown primate hybrid.  Most likely, the sample is a black bear contaminated by one or more humans and severely degraded, as shown in my previous blogs. 

References

[1]  Ketchum M. S. et al. (2013) Novel North American hominins: next generation sequencing of three whole genomes and associated studies. DeNovo 1:1.  Online only: http://sasquatchgenomeproject.org/ 


No comments:

Post a Comment