Search This Blog

Tuesday, December 30, 2014

Melba Ketchum’s Experts and Their Mistakes: What’s in a Phylotree


 
“If there had been bear, it would have been found. They BLASTed it against everything. Pig came up as the most prevalent non-human sequence even though none of the labs involved have ever had pig DNA in the labs.” Dr. Melba Ketchum, DVM

“Humans match a banana 39%.” Dr. Todd Disotell, Prof. of Anthropology, NYU

“Our community suffers from a lack of credibility, and I think conscientious self-policing is part of improving that.” Tyler Huggins, Bigfoot Researcher


 
Abstract 

Results of the database searches of three known consultants of Dr. Melba Ketchum, DVM,  are evaluated in the context of the Ketchum sasquatch DNA paper (see link at right).  Fundamental errors of three types were found: 1) "parsing" the nDNA sequences into small segments before conducting searches, 2) not searching all databases, especially since the NCBI Nucleotide Database has very limited bear data, and 3) (unbelievably) limiting searches to human.  New phylotrees are generated and compared to those of Ketchum (her Supp. Figs. 5 and 6) in the light of known taxonomy/phylogeny, with the result that the latter are improbable or severely truncated/limited.  The basis (specific database searches) for the Ketchum phylotrees remains unknown.   Results confirm previous findings (see first paper at right) that Sample 26 is a bear, Sample 31 is human but severely contaminated, and Sample 140 is a dog.

********************************************************************

 
After all is said and done, those who have studied the Ketchum DNA paper may wonder: How can different scientists reach such different conclusions concerning the nDNA sequences of Samples 26, 31 and 140, especially since they all use the same BLAST™ search software against the same NCBI databases? Earlier this year Melba Ketchum shared her “experts’” methodology and results with me. All of them wanted to remain anonymous; also I will not reveal anything about their institutional affiliations (which would surprise you). However, in spite of Melba’s request that I not share any of this, I believe the basics need to be made known so that the greater bigfoot community can get closure on her paper. It’s been almost two years since it came out. People have had adequate time to rethink their results and conclusions and either defend their claims or recant. Except for Melba Ketchum (who never rebuts with relevant details), the coauthors and consultants have remained silent. Science has no place for anonymity and silence.  Debate makes science self-correcting.  People should stand up for their findings and “man up” (or the female equivalent) to criticism, and if it’s unjustified, rebut it.  Failing this, they beg serious questions.

I was given results of three Ketchum consultants (1. to 3. below), none of whom were coauthors of the paper.  Quotations are from these three consultants.

Consultant 1: “Then I BLAST the sequence into NCBI (not signed in- so nothing exposed). I wish I could send this tree to Tyler Huggins to show how there is absolutely NO BEAR of any kind :-).” 

(Note: Tyler Huggins, Bart Cutino, and Bryan Sykes all confirmed by independent lab analyses that Sample 26 was black bear – see my blog “Ketchum Sample 26, The Smeja Kill: Independent Lab Reports”)

The phylotree this consultant refers to is obtained from the results of a single BLAST™ search of Sample 26, so if the search is too restrictive or the database is not complete (e.g. important data in other databases), the phylotree will not be representative. It’s clear that Melba and Co. only searched the NCBI “Nucleotide” database. Her paper says so in the Supplementary Materials and Methods Section: “Both the assembled reads and the de novo assembly contigs were fed to BLAST version 2.2.26 (blast.ncbi.nlm.hih.gov) with the following settings: database Nucleotide collection (nr/nt)….” (emphasis mine).

Before the paper was published there were no polar bear data in the Nucleotide database, and black bear data was extremely limited there. Only after I searched other NCBI databases (Transcriptome Shotgun Assembly, Reference Genomic Sequences, Genomes, Whole Genome Shotgun Assembly) did the match to bears become evident. My phylotree appears in Figure 1, based on a Sample 26 search of the Reference Genomic Sequences (RS) database. Figures 2 and 3 below are Ketchum Supplementary Figures 5 and 6, respectively, redrawn so you can read them (something she should have done). There is no indication of which tree goes with which sample. Ketchum Supplementary Figure 4 was entirely primate, including human, and need not be reproduced here.


 
 
  Figure 1.  Sample 26 vs. Reference Genomic Sequences 
  ZOOM to 150% to View




    Fig. 1.  Phylotree and distances from BLAST(TM) search of Sample 26 against Reference Genomic Sequences Database.  Scientific (Latin) names, tree branches, scale, and color key are unchanged, except for // used to compact the branch to chinchilla.  Abbreviated common names in bold.  Directions to mammals and nonmammals added for perspective.  Distances along branches and scale are fractions of nonmatching base pairs.




             Fig. 2.  Unknown sample number was not stated in the Ketchum paper.  Numbers  are distances to the unknown, calculated from the original Ketchum Supp. Fig. 5.         



Fig. 3.   Unknown sample number was not stated in the Ketchum paper.  Numbers are distances to the unknown, calculated from the original Ketchum Supp. Fig. 6.  Human distances are much greater than 0.02 claimed in the Ketchum paper and indicate a relatively distant relationship to the unknown sample.       

Sample 26
 

The phylotree in Fig. 1 is based on a Sample 26 search against the RS Database, limited to the 50 best overall hits with duplicates removed,  yielding 35 unique species hits over whole genomes.  In the RS database species have roughly equal amounts of data represented, i.e. whole genomes, although the number of species is much less than in the Nucleotide Database. Fortunately, the mammalian orders and families are fairly well represented. The species relationships in Fig. 1 are virtually identical to the NCBI Taxonomy Database (see link at right). Therefore the placement of Sample 26 within this phylotree is well founded. I also searched Sample 26 against the Nucleotide Database. The resulting phylotree was virtually the same as Fig. 1 except there were no bears among the top 50 hits. Fig. 1 is not at all like any of the Ketchum phylotrees, her Supp. Fig. 4, or my Figs. 2 and 3. Bears are obviously under-represented in the Nucleotide Database.  This was the Ketchum Team's undoing.  

The ridiculous phylotrees in Ketchum Supplementary Figures 5 and 6, redrawn here as Fig. 2 and Fig. 3, respectively, are the result of searching the wrong database or a poor selection of search parameters. In Fig. 2 the mouse and human are equally distant from S26, and fish and the chicken less related but still more related than any other species except mouse and human. Where’s everything else? Given the well-established phylotree/taxonomy of all known species, this is impossible. Simply stated, chicken fish, mouse, and human are too distantly related to each other to be, as a group, the most related species to the Ketchum Sample 26. Something is wrong; I’m not sure what, but the answer is not that a new species has been discovered, because it should fit somewhere in the known evolutionary tree of life like all the other known species on the planet. 

The distances to bears are much shorter in Fig. 1 than any species in Supp. Fig. 4 or Fig 2, indicating better matches and a closer phylogenetic relationship to Sample 26. Sample 26 IS A BEAR. “ :-).” Chicken, mouse, and fish don’t even appear among the top 35 species in Fig. 1, because they are too phylogenetically distant.

Figure 3 is also impossible:  only a mouse and a human as closest relatives - and not very close to the unknown at all compared to bears in Fig. 1. This Ketchum search is severely truncated/limited somehow. I don’t know how or why.

Only in August 2014, a year and a half after the publication of the Ketchum paper, was the complete polar bear genome sequence added to the RS database and other polar bear entries added to the Nucleotide database. Nevertheless, there was sufficient polar bear and panda (the closest black bear surrogates) data available to the Ketchum Team to have reached a bear conclusion. Her consultants just didn’t look in the right places (other databases).

Sample 31

Interestingly, the phylotree resulting from searching Sample 31 (the human) against either the Nucleotide Database or the RS Database contains fungus, bacteria, flatworms, and others, but no mammals, proving that the sample is contaminated and also that phylotrees of mixed samples may not be representative of the whole sample. Melba was aware of these results when I spoke to her in 2013, yet she continues to vigorously deny any contamination (see her recent YouTube video). Limited to mammals, the phylotree is very much like Ketchum Fig. 6 (Fig. 3 here) – only human and mouse.  Limited to primates, it showed only human. Searched against the RS Database, the results were similar to Supp. Fig. 4: a phylotree of primates. Since all the above results are not consistent, this is a mixed, i.e. contaminated, sample.


Sample 140

 Finally, the phylotree of Sample 140 against  either the Nucleotide or the RS database looks like Fig. 4. The dog is clearly the best match, and other species are in relatively correct positions based on known taxonomy. You’ll recognize most of them from Fig. 1. 


   Figure 4.  Sample 140 vs. Nucleotide
   ZOOM to 150% to View


 Fig. 4.  Same format as Fig. 1.  Abbreviated common names in bold.  See Fig. 1 for other common names.

 
 Consultant 2: a) “I…did a blast analysis on the three fasta contig files described below after parsing them into small substrings.” b) “…the results of blastn queries against a target db containing 4 different versions of the human genome assembly downloaded from NCBI…” (emphasis mine)

Two errors in methodology are apparent here: a) “parsing” a long sequence into small (60 bases here) substrings before searching against a database, and b) searching only the human genome. The first step (a) increases the chances of a match with an unrelated species, especially since many important genes are conserved. The shorter the string, the less it discriminates. Information is lost (how the substrings are connected). Secondly (b), how can you identify a species if you only search against a single favorite candidate (human here)? Talk about a preconceived notion!! Match statistics are useless without comparisons. The Sample 26 results did not at all suggest a human, or even a primate origin. The 94.2%ID  match against the human genome was not good enough for species identification (should be 99+%, or 97-98% for same family.  My Table 1 in the first paper (at right) also showed primates, including human, to match around 94%ID, which is not good enough for a match compared to polar bear at around 98-99+% and black bear with limited data at 100%ID (My Tables 1 and 3 in the first paper).


Similarly, for Sample 140, this consultant found 94.7%ID against the human genome.  I found 99+%ID match with the dog.

Alas, both the consultant and I got 99+%ID match against the human for sample 31.  I searched all species; he only searched against human.,

“Quote # three below was the genome overall, not chromosome 11 but the random sequences were taken from the raw data without assembly and done by a major university genomics center.”   Dr. Melba Ketchum, DVM

Consultant 3:Randomly selected subsets of reads were taken from each of the data set and compared to the database of known sequence (the NR database) using BLAST software. A small percentage of reads were related to known mammalian sequences: 1.2% of reads from k-26 matched consistently to pig.“ (Emphasis mine)

“Randomly selected subsets” can lead to unrepresentative results. I know; I’ve seen it tried on this very Sample 26 (called k-26 by the consultant). Why omit good data, when the means are available to analyze it all? (Hint: So you can make an “armchair” call with much less effort.) Also, once again, searching only the Nucleotide (NR) database has led to an incomplete, incorrect result. If “only a small percentage of reads were related to known mammalian sequences…” wouldn’t you rethink the search strategy? Maybe search another database? Not this “expert.” Results like this led Melba Ketchum to conclude that a new species was discovered. She did not understand that a new species has to resemble some existing ones if you believe in evolution and phylotrees. The branches are all connected. There are no broken off (unrelated) branches. Finally, the 1.2% of reads which match pig is a red herring. Such a small percentage has no significance, especially if no comparison is made to other species in the correct databases. Just so that the pig is not further maligned, I matched Sample 26 to pig and compared the 10 best matches by score to the polar bear over the same sequence ranges in Table 1. The bear wins handily every time. There is NO PIG. It’s A BEAR, but I’m repeating myself. Also see in the phylotree in Figure 1 above that the pig is not a good match compared to bears.

  
   Table 1. 


Table 1.  Headings: Accession = NCBI GenBank. %ID = percentage of matching bases over the matching sequence range.  LEN = matching sequence length.  MIS = number of mismatching bases.  GAP = Gaps in alignment.  Start = Sample 26 sequence starting position.  End = Sample 26 sequence ending position.  SCORE overall matching rating, See BLAST(TM) website at right.


I seriously doubt that the Chromosome 11 results above are not representative of the whole genome. Prove me wrong, Consultant 3, but please search the RS database first.

In spite of their supposed credentials, highly touted by Melba, these three consultants, and I suspect others as well, MISSED THE BOAT. They BLEW IT. With friends like these Melba needs no enemies, but unfortunately she still thinks the world of them:

“ If I could, I would scream these analyses from the rooftops but I am ethical and will respect their wishes.” (to remain anonymous)    Dr. Melba Ketchum, DVM


 
People don’t listen to screamers, Melba. They might listen to a reasoned rebuttal if you have one.

Monday, December 29, 2014

Melba Ketchum Shows Sample 140 Degradation in her YouTube Video


Abstract

While attempting to dispel contamination in Sample 140, Melba Ketchum demonstrated a failure to amplify and sequence the correct gene (cyt b). Similar situations were demonstrated for Samples 33 and 35. The likely cause is mispriming or self-priming of degraded DNA.


*****************************************************************************


Dr. Melba Ketchum, DVM, has opened a YouTube channel which presently contains six videos: two book promotions, a story about dogman, a press conference, a related study, and an attempt to show that her study samples were uncontaminated. The last video includes good explanations of DNA methodology worthy of a read by any interested party. Unfortunately for her, it also contains an example which shows that her sample was degraded, a point she has consistently denied, in spite of strong evidence to the contrary (see blogs at right). Does she ever check anything?

The example of interest here is an electropherogram and sequence at 30.01 minutes into the video, purportedly from the Sample 140 cytochrome b (cyt b) gene. This gene occupies positions 14747 through 15887 of the 16569 bp human mitochondrial genome as exemplified by the rCRS standard (see Phylotree at right). I did a BLAST™ search of this purported cyt b sequence from the video:

CTTGTGCGGGATATTGATTTCACGGAGGATGGTGG
TCAAGGGACCCCTATCTGAGGGGGGTCATCCATGG
 GGGCGAGAAGGGATTTG

with the result of a 100%ID match to the human 16434-16348 minus strand, covering the control region(16434-16384) and the HV-1 region (16383-16348), clearly 461 bp distant from the purported cyt b region (14747-15887), which was not sequenced. Primers are normally designed to sequence the plus strand for consistency, so something is clearly wrong here.

This means that the sample was misprimed, probably self-primed due to degradation. Consistent with the Ketchum electron micrographs showing regions of single-strand DNA, self-priming occurs when one strand primes the other, with the intervening gap of single strand DNA (here in the plus strand) serving as the template for the sequencing of the priming strand (minus here) in the gap. The cyt b primers were not effective in this case; the cyt b region was not amplified or sequenced because of the favored reactions on the self-primed strand in the control/HV-1 regions. This occurs all the time for microbially degraded or contaminated samples, and should have been a flag to the Ketchum team of “experts.” Peer reviewers pointed this out:

“…when they are likely explained by DNA degradation or contamination.”  

AND

“… in degraded DNA, artifactual amplifications are expected.” 

But what was Melba’s response?  

“The DNA was not degraded as we had a yield gel of the raw DNA showing clear bands (Figure 7) not smears.” (emphasis mine).

 Actually in her Figure 10, Sample 140 does show smears (degradation). There are even two major bands. 

Degradation of the type causing self-priming (type 1) may not even show up as smears because relatively short spans of single strands would not change the molecular weight by enough to be visibly distinct from the pure double strand sequence. Smears indicate a continuous range of relatively high molecular weight fragments (type 2) caused by severing both strands in numerous places, not short single strands among normal length double strands.

Also, the purposely degraded blood sample which Melba presents as an example of degradation in Figure 10 is of the second type (above), and in any case it should not have been presented as representative of field degradation in the real samples, as I pointed out in a previous blog. Most of her samples were hairs (not blood) exposed to environmental (not laboratory) conditions of heat and cold, light (especially UV), rain, and microbes at the least. Her lab experiment did not replicate these conditions. However, she says in her paper,

“…the contaminated, hemolyzed, degraded blood sample comprised a standard by which the unknown DNA obtained under more controlled conditions could be evaluated.” (emphasis mine).

More controlled?” I say unknown field conditions are less controlled. The NCBI calls uncontrolled samples like this “environmental samples,” meaning that their reported DNA sequences cannot be associated with any known organism. Could Melba seriously have thought that her “controlled” laboratory treatment of unknown samples was the only possible source of degradation to which they were exposed? In her paper she makes a lot of the control samples from laboratory personnel but does not take seriously enough other possible sources of contamination or degradation. None of her samples were collected directly under sterile conditions from an animal and immediately preserved, as would normally be the case for a known animal sampled in a laboratory or even a person submitting a test kit to a laboratory by mail.    

These results are similar to those found in the TAP1 nuclear gene sequence for Sample 33 found on the Sasquatch Genome Project website (“Supplemental Raw Data” tab). In that case mitochondrial DNA was amplified and sequenced. Clearly the primer did not work there either. This and other anomalies are discussed in my third paper (see at right).

Possibly related is the strange case of Sample 35, the Arizona toenail, described in a previous blog (see at right). In that case Richard Stubstad, using Melba’s alleged cyt b sequence, drew incorrect conclusions from a few database comparisons. However, his sequence of 360 bp did not match the 446 bp length of a cyt b sequence which would be obtained from the stated primer pair MCB 398/MCB 869. This sequence was not published in Stubstad’s brief article (see at right), but we suspect it was not cyt b either.

In the light of the above, I call upon Melba Ketchum to release for public scrutiny all the “cyt b” sequences of the 111 samples in her study, all of which she claimed in her paper to be human.

In summary, while attempting to dispel contamination in Sample 140, Dr. Melba Ketchum, DVM, demonstrated failure to amplify and sequence the correct gene (cyt b), which is likely caused by sample degradation. She should have checked her results before going on YouTube, or in fact before writing her paper.  Melba had very condescending words for Prof. Bryan Sykes, however:

“That's why our study took so long, check and re-check among other things….he should have been more thorough…it’s just that they didn’t delve deep enough.”


Walk the talk, Melba.