Search This Blog

Tuesday, December 30, 2014

Melba Ketchum’s Experts and Their Mistakes: What’s in a Phylotree


 
“If there had been bear, it would have been found. They BLASTed it against everything. Pig came up as the most prevalent non-human sequence even though none of the labs involved have ever had pig DNA in the labs.” Dr. Melba Ketchum, DVM

“Humans match a banana 39%.” Dr. Todd Disotell, Prof. of Anthropology, NYU

“Our community suffers from a lack of credibility, and I think conscientious self-policing is part of improving that.” Tyler Huggins, Bigfoot Researcher


 
Abstract 

Results of the database searches of three known consultants of Dr. Melba Ketchum, DVM,  are evaluated in the context of the Ketchum sasquatch DNA paper (see link at right).  Fundamental errors of three types were found: 1) "parsing" the nDNA sequences into small segments before conducting searches, 2) not searching all databases, especially since the NCBI Nucleotide Database has very limited bear data, and 3) (unbelievably) limiting searches to human.  New phylotrees are generated and compared to those of Ketchum (her Supp. Figs. 5 and 6) in the light of known taxonomy/phylogeny, with the result that the latter are improbable or severely truncated/limited.  The basis (specific database searches) for the Ketchum phylotrees remains unknown.   Results confirm previous findings (see first paper at right) that Sample 26 is a bear, Sample 31 is human but severely contaminated, and Sample 140 is a dog.

********************************************************************

 
After all is said and done, those who have studied the Ketchum DNA paper may wonder: How can different scientists reach such different conclusions concerning the nDNA sequences of Samples 26, 31 and 140, especially since they all use the same BLAST™ search software against the same NCBI databases? Earlier this year Melba Ketchum shared her “experts’” methodology and results with me. All of them wanted to remain anonymous; also I will not reveal anything about their institutional affiliations (which would surprise you). However, in spite of Melba’s request that I not share any of this, I believe the basics need to be made known so that the greater bigfoot community can get closure on her paper. It’s been almost two years since it came out. People have had adequate time to rethink their results and conclusions and either defend their claims or recant. Except for Melba Ketchum (who never rebuts with relevant details), the coauthors and consultants have remained silent. Science has no place for anonymity and silence.  Debate makes science self-correcting.  People should stand up for their findings and “man up” (or the female equivalent) to criticism, and if it’s unjustified, rebut it.  Failing this, they beg serious questions.

I was given results of three Ketchum consultants (1. to 3. below), none of whom were coauthors of the paper.  Quotations are from these three consultants.

Consultant 1: “Then I BLAST the sequence into NCBI (not signed in- so nothing exposed). I wish I could send this tree to Tyler Huggins to show how there is absolutely NO BEAR of any kind :-).” 

(Note: Tyler Huggins, Bart Cutino, and Bryan Sykes all confirmed by independent lab analyses that Sample 26 was black bear – see my blog “Ketchum Sample 26, The Smeja Kill: Independent Lab Reports”)

The phylotree this consultant refers to is obtained from the results of a single BLAST™ search of Sample 26, so if the search is too restrictive or the database is not complete (e.g. important data in other databases), the phylotree will not be representative. It’s clear that Melba and Co. only searched the NCBI “Nucleotide” database. Her paper says so in the Supplementary Materials and Methods Section: “Both the assembled reads and the de novo assembly contigs were fed to BLAST version 2.2.26 (blast.ncbi.nlm.hih.gov) with the following settings: database Nucleotide collection (nr/nt)….” (emphasis mine).

Before the paper was published there were no polar bear data in the Nucleotide database, and black bear data was extremely limited there. Only after I searched other NCBI databases (Transcriptome Shotgun Assembly, Reference Genomic Sequences, Genomes, Whole Genome Shotgun Assembly) did the match to bears become evident. My phylotree appears in Figure 1, based on a Sample 26 search of the Reference Genomic Sequences (RS) database. Figures 2 and 3 below are Ketchum Supplementary Figures 5 and 6, respectively, redrawn so you can read them (something she should have done). There is no indication of which tree goes with which sample. Ketchum Supplementary Figure 4 was entirely primate, including human, and need not be reproduced here.


 
 
  Figure 1.  Sample 26 vs. Reference Genomic Sequences 
  ZOOM to 150% to View




    Fig. 1.  Phylotree and distances from BLAST(TM) search of Sample 26 against Reference Genomic Sequences Database.  Scientific (Latin) names, tree branches, scale, and color key are unchanged, except for // used to compact the branch to chinchilla.  Abbreviated common names in bold.  Directions to mammals and nonmammals added for perspective.  Distances along branches and scale are fractions of nonmatching base pairs.




             Fig. 2.  Unknown sample number was not stated in the Ketchum paper.  Numbers  are distances to the unknown, calculated from the original Ketchum Supp. Fig. 5.         



Fig. 3.   Unknown sample number was not stated in the Ketchum paper.  Numbers are distances to the unknown, calculated from the original Ketchum Supp. Fig. 6.  Human distances are much greater than 0.02 claimed in the Ketchum paper and indicate a relatively distant relationship to the unknown sample.       

Sample 26
 

The phylotree in Fig. 1 is based on a Sample 26 search against the RS Database, limited to the 50 best overall hits with duplicates removed,  yielding 35 unique species hits over whole genomes.  In the RS database species have roughly equal amounts of data represented, i.e. whole genomes, although the number of species is much less than in the Nucleotide Database. Fortunately, the mammalian orders and families are fairly well represented. The species relationships in Fig. 1 are virtually identical to the NCBI Taxonomy Database (see link at right). Therefore the placement of Sample 26 within this phylotree is well founded. I also searched Sample 26 against the Nucleotide Database. The resulting phylotree was virtually the same as Fig. 1 except there were no bears among the top 50 hits. Fig. 1 is not at all like any of the Ketchum phylotrees, her Supp. Fig. 4, or my Figs. 2 and 3. Bears are obviously under-represented in the Nucleotide Database.  This was the Ketchum Team's undoing.  

The ridiculous phylotrees in Ketchum Supplementary Figures 5 and 6, redrawn here as Fig. 2 and Fig. 3, respectively, are the result of searching the wrong database or a poor selection of search parameters. In Fig. 2 the mouse and human are equally distant from S26, and fish and the chicken less related but still more related than any other species except mouse and human. Where’s everything else? Given the well-established phylotree/taxonomy of all known species, this is impossible. Simply stated, chicken fish, mouse, and human are too distantly related to each other to be, as a group, the most related species to the Ketchum Sample 26. Something is wrong; I’m not sure what, but the answer is not that a new species has been discovered, because it should fit somewhere in the known evolutionary tree of life like all the other known species on the planet. 

The distances to bears are much shorter in Fig. 1 than any species in Supp. Fig. 4 or Fig 2, indicating better matches and a closer phylogenetic relationship to Sample 26. Sample 26 IS A BEAR. “ :-).” Chicken, mouse, and fish don’t even appear among the top 35 species in Fig. 1, because they are too phylogenetically distant.

Figure 3 is also impossible:  only a mouse and a human as closest relatives - and not very close to the unknown at all compared to bears in Fig. 1. This Ketchum search is severely truncated/limited somehow. I don’t know how or why.

Only in August 2014, a year and a half after the publication of the Ketchum paper, was the complete polar bear genome sequence added to the RS database and other polar bear entries added to the Nucleotide database. Nevertheless, there was sufficient polar bear and panda (the closest black bear surrogates) data available to the Ketchum Team to have reached a bear conclusion. Her consultants just didn’t look in the right places (other databases).

Sample 31

Interestingly, the phylotree resulting from searching Sample 31 (the human) against either the Nucleotide Database or the RS Database contains fungus, bacteria, flatworms, and others, but no mammals, proving that the sample is contaminated and also that phylotrees of mixed samples may not be representative of the whole sample. Melba was aware of these results when I spoke to her in 2013, yet she continues to vigorously deny any contamination (see her recent YouTube video). Limited to mammals, the phylotree is very much like Ketchum Fig. 6 (Fig. 3 here) – only human and mouse.  Limited to primates, it showed only human. Searched against the RS Database, the results were similar to Supp. Fig. 4: a phylotree of primates. Since all the above results are not consistent, this is a mixed, i.e. contaminated, sample.


Sample 140

 Finally, the phylotree of Sample 140 against  either the Nucleotide or the RS database looks like Fig. 4. The dog is clearly the best match, and other species are in relatively correct positions based on known taxonomy. You’ll recognize most of them from Fig. 1. 


   Figure 4.  Sample 140 vs. Nucleotide
   ZOOM to 150% to View


 Fig. 4.  Same format as Fig. 1.  Abbreviated common names in bold.  See Fig. 1 for other common names.

 
 Consultant 2: a) “I…did a blast analysis on the three fasta contig files described below after parsing them into small substrings.” b) “…the results of blastn queries against a target db containing 4 different versions of the human genome assembly downloaded from NCBI…” (emphasis mine)

Two errors in methodology are apparent here: a) “parsing” a long sequence into small (60 bases here) substrings before searching against a database, and b) searching only the human genome. The first step (a) increases the chances of a match with an unrelated species, especially since many important genes are conserved. The shorter the string, the less it discriminates. Information is lost (how the substrings are connected). Secondly (b), how can you identify a species if you only search against a single favorite candidate (human here)? Talk about a preconceived notion!! Match statistics are useless without comparisons. The Sample 26 results did not at all suggest a human, or even a primate origin. The 94.2%ID  match against the human genome was not good enough for species identification (should be 99+%, or 97-98% for same family.  My Table 1 in the first paper (at right) also showed primates, including human, to match around 94%ID, which is not good enough for a match compared to polar bear at around 98-99+% and black bear with limited data at 100%ID (My Tables 1 and 3 in the first paper).


Similarly, for Sample 140, this consultant found 94.7%ID against the human genome.  I found 99+%ID match with the dog.

Alas, both the consultant and I got 99+%ID match against the human for sample 31.  I searched all species; he only searched against human.,

“Quote # three below was the genome overall, not chromosome 11 but the random sequences were taken from the raw data without assembly and done by a major university genomics center.”   Dr. Melba Ketchum, DVM

Consultant 3:Randomly selected subsets of reads were taken from each of the data set and compared to the database of known sequence (the NR database) using BLAST software. A small percentage of reads were related to known mammalian sequences: 1.2% of reads from k-26 matched consistently to pig.“ (Emphasis mine)

“Randomly selected subsets” can lead to unrepresentative results. I know; I’ve seen it tried on this very Sample 26 (called k-26 by the consultant). Why omit good data, when the means are available to analyze it all? (Hint: So you can make an “armchair” call with much less effort.) Also, once again, searching only the Nucleotide (NR) database has led to an incomplete, incorrect result. If “only a small percentage of reads were related to known mammalian sequences…” wouldn’t you rethink the search strategy? Maybe search another database? Not this “expert.” Results like this led Melba Ketchum to conclude that a new species was discovered. She did not understand that a new species has to resemble some existing ones if you believe in evolution and phylotrees. The branches are all connected. There are no broken off (unrelated) branches. Finally, the 1.2% of reads which match pig is a red herring. Such a small percentage has no significance, especially if no comparison is made to other species in the correct databases. Just so that the pig is not further maligned, I matched Sample 26 to pig and compared the 10 best matches by score to the polar bear over the same sequence ranges in Table 1. The bear wins handily every time. There is NO PIG. It’s A BEAR, but I’m repeating myself. Also see in the phylotree in Figure 1 above that the pig is not a good match compared to bears.

  
   Table 1. 


Table 1.  Headings: Accession = NCBI GenBank. %ID = percentage of matching bases over the matching sequence range.  LEN = matching sequence length.  MIS = number of mismatching bases.  GAP = Gaps in alignment.  Start = Sample 26 sequence starting position.  End = Sample 26 sequence ending position.  SCORE overall matching rating, See BLAST(TM) website at right.


I seriously doubt that the Chromosome 11 results above are not representative of the whole genome. Prove me wrong, Consultant 3, but please search the RS database first.

In spite of their supposed credentials, highly touted by Melba, these three consultants, and I suspect others as well, MISSED THE BOAT. They BLEW IT. With friends like these Melba needs no enemies, but unfortunately she still thinks the world of them:

“ If I could, I would scream these analyses from the rooftops but I am ethical and will respect their wishes.” (to remain anonymous)    Dr. Melba Ketchum, DVM


 
People don’t listen to screamers, Melba. They might listen to a reasoned rebuttal if you have one.