Labels

Search This Blog

Showing posts with label Carpenter. Show all posts
Showing posts with label Carpenter. Show all posts

Wednesday, November 26, 2014

Ketchum Sample 26, The Smeja Kill: Independent Lab Reports


The sample of flesh and hair collected by Justin Smeja weeks after his reported sasquatch killings has been analyzed for DNA sequences by four labs:
1.  As part of the Ketchum DNA study (Sample 26) by her associates.  They sequenced the entire mtDNA genome and 2.7 M  nDNA bp.
2.  By Wildlife Forensic DNA Laboratory at Trent University, Ottawa, submitted by Tyler Huggins (see link at right).  They sequenced the human HV1 mtDNA region and black bear STR nuclear microsatellite loci.
3.  By DNA Solutions, submitted by Bart Cutino.  They sequenced the human HV1, HV2 and black bear cytochrome b mtDNA regions.
4.  By Mitotyping Technologies, submitted by Prof. Bryan Sykes as part of his recent paper (see link at right).  They sequenced the black bear 12S rRNA mtDNA gene.
In addition, the Ketchum results were reinterpreted by me, but I had no sample and did no new laboratory experiments (See my three papers at right) .

All three independent labs (2., 3., 4.) concluded that the sample was from a black bear, as did I based on the published nDNA  sequence in the Ketchum paper.  Labs 2. And 3. also found human contamination.  Prof. Sykes cleaned up his sample first, and so saw no human mtDNA, only black bear.  This would seem to put the nail in the coffin, except that Melba has said (e.g. to me privately) that Justin Smeja sent different known bear samples out in place of Sample 26, so that the results would exonerate him from possible criminal charges for killing something resembling a human.  Is there any way we can check on this claim?  Not exactly, but we can show that the human mtDNA in all three samples is virtually the same – Justin Smeja’s.
Related to this, Scott Carpenter in his blog, “Ketchum DNA Study - Sample 26” (http://www.bf-field-journal.blogspot.com/p/ketchum-dna-study-sample-26.html) made the claim that Melba found H1a as the haplogroup, while Huggins (Scott thought) gave haplogroup A, and Cutino T2 for the human contamination.  Can a sample have three different human haplogroups?  Carpenter said no, and therefore the original sample, Ketchum 26, could not have been split to Huggins and Cutino.  But are these different haplogroups accurately determined?   Unfortunately not.  Each has a problem. 


Scott is correct in that the Huggins report (see at right) lists haplogroup A for the controls and the sample in its Table 1.  However, further into the report (p. 2) details conflict with haplogroup A, which is First American out of Asia: “Further analysis indicates that it is a European haplotype; it occurs with highest frequency in East Europe (11%) and Caucasus (10%) (i.e. it initially originated from near the  Caucasus  mountains  regions  between  the  Black  and  Caspian  Seas).  It is  not  known  to  have originally occurred in East Asia, Southeast Asia, Australia, Oceania (i.e. New Zealand, Papua New Guinea, etc.), North America, South America or Central America.”  I asked Tyler Huggins about this discrepancy, and he said that the Table 1. listing “A” was a place holder meant to indicate that all four samples, Huggins_1 (the flesh), 570, 576, and 578 (the last three buccal swabs from Justin Smeja) had identical haplogroups, not Haplogroup A.  This is poor reporting on the part of the lab and confusing.
The Huggins report indicates that the HV1 region of mtDNA (positions 15999-16400) matched 100% NCBI database accession JQ705199.  Now we have the sequence which we can compare to Cutino’s published HV1 and to the Ketchum S26 mtDNA sequence.  The results are as follows:
 
Compared with rCRS (revised Cambridge Reference Sequence), the HV1 mutations are:
Huggins  16126C  16187T   16294T  16296T  16304C             
Cutino    16126C   16187T   16294T  16296T  16304C
Ketchum  16126C  16187T  16294T  16296T  16304C
16313T        

Thus, Huggins and Cutino samples are identical, and Ketchum differs from these two by the one highlighted mutation 16313T.

Huggins did not sequence HV2, but we can compare Cutino and Ketchum:
Ketchum has one extra HV2 mutation 384G.
How significant are these two extra mutations (16313T and 384G)?  16313T occurs nowhere in the most recent version (Build 16) of the mtDNA Phylotree (link at right).  384G occurs only twice (haplogroups T1a2a and U5b1h) in the entire Phylotree, which is based on over 20,000 different human mtDNA sequences.   My complete mtDNA sequence interpretation of the Ketchum samples (my second paper at right) showed that Sample 26 was outside the range of known human haplotypes by 16 extra mutations including 384G and 16313T.  The average number of extra mutations is less than 3 (2.37 based on my subsample of 35 H1a database entries).  Based on statistics of mutation frequencies, the Ketchum sample had a 1 in 224,056,304 chance of being from the known human population; therefore, it has no human haplogroup.  Justin Smeja’s three buccal swabs matched the Huggins sample over HV1 and therefore also the Cutino sample   







Let’s go back to the reported Haplogroups:
Huggins: A (known to be false – see above)
Cutino: T2
Ketchum: reported H1a but equally likely to be H5e – both with 16 extra mutations, not the “one novel SNP” claimed in her paper, Table 2.
 
The Cutino determination T2 was based on the HV1 mutations 16126C, 16294T, and 16304C only; add 16187T and 16296 and the haplogroup is T2b3e, which however does not match the coding region mutations of the Ketchum sample (coding region was not determined by Huggins or Cutino). Ketchum said the sample had “one novel SNP” from H1a.  It has 16.  A sample this far from human cannot be correctly haplogrouped, as I pointed out in my second paper.     
 
One more piece of data, the Cutino report included a cytochrome b sequence, which matches black bear best and other bears next, no human.  How can one sample produce human mtDNA (HV1, HV2) and bear mtDNA (cyt b)?  It’s the primers – human specific for HV1, HV2 and universal for cyt b, respectively.  The sample is black bear with human impurity.
So, with high probability all three samples had Justin Smeja’s mtDNA over the HV1 region.  All three as well as one more (Sykes’) tested positive for black bear.  Scott, you have to dig into the details more by doing BLAST™ searches and comparisons.  Your analysis was too shallow to reveal the truth.  It relied on contract lab interpretations and misstatements. And unfortunately, contract labs need very close monitoring to get good results.  As one Ketchum Peer Reviewer said, “It is very difficult to manage the quality control and release standards of contract laboratories from the outside.”  Here, one (Huggins’) made misleading/incorrect table entries (haplogroup A), another (Cutino’s) ignored extra mutations (16187T, 16296T) without comment or qualifiers, and a third (Ketchum’s) reported a Haplogroup H1a without commenting on the 16 extra mutations which make it improbably human.

Thanks to Scott Carpenter for raising the issue and making the reports available on his website (link at right).  But sorry, Scott, your conclusion is wrong.   NO THANKS to Melba Ketchum who refuses to release the Smeja Sample 26 for further testing (e.g. as outlined above by me), after repeated public and private requests by Derek Randles, who submitted it to her.  What is she hiding?  Respectable scientists cooperate with efforts to cross check their work; the best even solicit it and collaborate. Regardless of their legitimate points of disagreement, when the results come out differently, they don’t impugn the integrity of other investigators without proof of any wrong doing.
My take on this sample is that it has no verifiable connection to the Smeja killings. Only Justin Smeja and his hunting companion know what he shot, and it has no fresh, connected, and documented sample to analyze.  Even matching Smeja’s DNA in the various samples doesn’t prove they were all from a single original, only that he contaminated them all. What this issue really needs is for Melba Ketchum to repeat the Huggins STR analysis with black bear primers or release the sample for others to do so.  Then we can see whether the black bear in both samples is the same bear.   I asked her to do this early last year and she ignored me, or at least has not published the results.


Now for the shocker.  The STR analysis with black bear primers in the Huggins report Figures A7.1 and A7.2 shows more than two peaks (alleles; normally there’s only one from each parent) for each gene, so the bear portion is also contaminated, probably by a second bear, but possibly by
Justin’s dog (more research needed here).   Did two bears fight over the sasquatch carcass, one biting or clawing off a piece of flesh from the other and contaminating it?  In the process did the Sample 26 also get contaminated with original sasquatch DNA, confusing the whole mtDNA sequencing by Ketchum, which turned out to be improbably human and possibly a mixed result?





 
     





Monday, November 3, 2014

The Ketchum Peer Reviews




"At least now everything I have said can be authenticated including the ridiculous and biased nature of the reviews. Now people will know the truth and that we did pass peer review.  (Melba Ketchum)
 

 

"Extraordinary claims require extraordinary evidence."  (Anonymous Ketchum Reviewer)

"Extraordinary evidence requires extraordinary review."  (Haskell Hart)

"Hindsight is 20/20."  (Anonymous)

Dr. Melba Ketchum first submitted her famous paper (link at right) to Nature which rejected it on November 1, 2012, based on comments of four reviewers and after she first tried to satisfy their requests in a second round.  She then submitted the paper to Journal of Advanced Multidisciplinary Exploration in Zoology (JAMEZ), which she claims accepted it after some revisions.      

Much has been made of the history of the defunct journal, Journal of Advanced Multidisciplinary Exploration in Zoology (JAMEZ), sometimes called redundantly Journal of Advanced Zoological Exploration in Zoology by Scott Carpenter (see link to his blog at right), and its successor DeNovo, and whether in fact Dr. Melba Ketchum purchased the former to acquire the rights to her "peer reviews" so she could tout them in her own publication DeNovo, where her paper  appeared.  This debate is not about science, and will be left to others.  Instead we focus here on the content of the peer reviews, taken as scientific criticism and evaluated at face value  in the light of current knowledge of the Ketchum study based on my in depth analyses in three papers (see at right).  In so doing we hope to answer the question of whether her work was fairly evaluated.

Nature had four reviewers, one of whom admitted he was not a geneticist (but then neither are Melba or her coauthors).  JAMEZ had only two reviewers, which is not enough for a paper of this import which is likely to be controversial.  All the comments are combined and categorized here.  There were 32 separate numbered comments in all, which addressed 15 separate issues or points of concern about the paper by my tally; one comment was positive (number 12 below).  This is a lot for one paper.

Others might categorize them slightly differently, but I believe they would capture substantially the same criticisms of the reviewers.  Keep in mind that the JAMEZ reviewers likely received a paper that was revised based on the comments of the Nature reviewers.  The tally of issues looks like this to me.

The Issues

In order of decreasing reviewer consensus.  Bold numbers in brackets are the number of reviewers (of the six total) who concurred.

(1)  Inadequately substantiated thesis.  Results do not support conclusions. [6]

(2)  Results of genomic analysis superficially treated.  Results are not adequately documented.  More stepwise in depth treatment needed.  Need more information on analysis of whole genomes. Analytical credibility could be improved by fully leveraging information from next generation sequencing. Sequences should be available.  Bioinformatics should include reference sequences from expected contaminants.  [4]

(3)  Hominin not proven.  Primate a better term. [4]

(4)  Phylogenetic trees not clear, inadequate, or give mixed results. [4]

(5)  Concern about monitoring submitter contamination. Results indicate contamination. [4]  Did avoid contamination. [1]

(6)  S26 provenance murky at best. [2]

(7)  Selectively aggregated sequences to support a favorable placement among primates. Reference sequence bias in developing consensus sequence. [2]

(8)  Poor agreement between mtDNA and nDNA results across samples.  S31 does not align with S26 and S140. [2]

(9)  Change "unknown" hair morphology to "novel." Should cross reference human morphology.  Expected mistakes in hair identification made. Needs statistical analysis. [2]

(10)  Quality control is difficult with contract labs. [1]

(11)  S26 - ethics of shooting. [1]

(12)  Q30 scores important. Seemed to justify publication. [1] 

(13)  Need better photographic evidence.  Stick structures inappropriate. [1]

(14)  mtDNA is all "poorly" human.  Would expect some other lineages.
[1]

(15)  Unknown sequences more likely from unknown microorganism. [1]

(16)  Electron microscopy suggests DNA damage
. [1] 
 


Where to begin? Number 1 is the bottom line and will fall out of the other criticisms.  I believe the key issues are numbers (2), (4), and (7), which are related .  Here’s what the reviewers said about these issues:

1.  The bioinformatics should include gene sequences from expected outlier species that may also be capable of contributing contaminating nucleic acids.”

2. “ The molecular genetics in this manuscript are the most important and it would be important to include information regarding the analysis of the whole genomes.”

3. “I would suggest the authors ….build phylogenetic trees with all possible mammal mtDNA genomes and nuclear data available at genbank.”

 
4.   To make a compelling case I need seeing mtDNA genomes and large numbers of nuDNA sequences that points(sic) in direction of a new hominin species i.e. ape or human like without being identical to know(sic) species.”

 
5.  “…. that several phylogenetic/gene trees have been included, they desperately need redesigning, since the text on the branches is so small that it can't be read without exceptional magnification. As such, the trees are essentially useless.”

6. “Sequencing data should be freely available to the scientific community after publishing (even better, before).”  (NOTE:  Sequencing data was unavailable to the Nature reviewers.  When added, one JAMEZ reviewer made the comment no. 7) 

7.  The bioinformatics should include gene sequences from expected outlier species that may also be capable of contributing contaminating nucleic acids. For example, a BLASTN search using Sample 26 does turn up some exceptionally strong homology with a gene from Ursus americanus (DQ240386.1). This would support the idea that the consensus sequence may have been affected by contaminant sequences.” (NOTE:  Ursus americanus is the American black bear.)


Just how did the Ketchum Team compare their sequences to those of known species?  Or did they?  Statements such as “Because the global BLASTn demonstrated statistically significant alignment across the Primate order; a Primate ‘Drill Down’ utilizing BLASTn with inclusive Primate organism taxids was analyzed,” are not nearly discriminating enough.  “Statistically significant alignment” needs numbers and comparisons to back it up.  Nowhere does the paper describe a comprehensive set of BLAST™ searches against databases of other animals such as I did in my first paper.  Everything is referenced to human chromosome 11 with no indication of how well it aligns.  Furthermore restricting the search to ”primate organism taxids” only results in tunnel vision and loss of perspective.  Such a procedure should be preceded by more general searches and justified by hard numbers from these.  It was not.


“As shown in Supplementary Figures 4, 5 and 6, the Sasquatch consensus that showed homology to human chromosome 11 reference sequence is related to primate lineages including Homo sapiens, Otolemur garnettii, Pan troglodytes (Chimpanzee), Macaca mulatta (Rhesus Monkey), Nomascus leukogenys (White cheeked Gibbon) and Callithrix jacchus (Common Marmoset) and other primate species,”(from the Ketchum paper) is a false statement.  Only Supp. Fig. 4 shows this relationship.  As I pointed out previously (See my blog “Otolemur garnettii is No Lemur and We're Not a Fish, a Chicken, or a Mouse”) Supp. Fig. 5 shows a phylotree with a chicken, a mouse, and 29 species of FISH as nearest relatives.  Supp. Fig. 6 shows only the mouse.  The paper does not say which figure goes with which sample, but Supp. Fig. 4 is likely Sample 31, the human.  I doubt the Ketchum team even translated the Latin (scientific) species names in Supp. Figs. 5 and 6, or they would not have published them without explanation.  These figures do not support the statement above.

In my first paper, Table 3, shows exactly the same result that was found in comment no. 7.  At first, I didn’t think much of it because the matching sequence was relatively short compared to others I had matched (my Tables 1, 4, and 5).  Later I realized that there is relatively little black bear data in the databases and that the giant panda and the polar bear are much better represented.  Matches to these bears were much better than any primate (including human) matches for Sample 26 (See Table 1 in my first paper.).   


Melba’s response to the no. 7 comment was:  “…DQ240386 is statistically significantly aligned with primates and carnivores.  In fact, BLASTing DQ240386- ring tailed cats of the raccoon family and seal have as much alignment as Ursus americanus. The maximum score for raccoon and seal are about 850. Maximum score for Ursus americanus is about 900. Max score for Sample 26 is 538. This shows contamination bias.”

In her response Melba demonstrates the taxonomy of the suborder Caniformia, order Carnivora, which includes the families:


The gene (BDNF) contained in the sequence mentioned by Melba is apparently highly conserved, so matches to a variety of Caniformia can be expected.  HOWEVER, “The devil’s in the details.”  Table 1 shows that the top matches of DQ240386 in the nucleotide database are all bears. 

 

Table 1.  Matches to DQ240386.1 - Black Bear

ACCESSION

%ID

SPECIES

SAME

MIS

GAPS

SCORE

 

Sample 26

100

black bear

species

0

0

538

XM008686880.1

100

polar bear

genus

0

0

889

DQ093584.1

100

Asiatic black bear

genus

0

0

889

AY011500.1

100

brown bear

genus

0

0

889

AF002239.1

99.79

brown bear

genus

1

0

883

AF002240.1

99.17

Malayan sun bear

family

4

0

867

DQ240387.1

98.77

giant panda

family

6

0

870

GU931015.2

98.75

giant panda

family

6

0

856

GU931011.1

98.54

ringtail

suborder

7

0

850

DQ660197.1

98.54

ringtail

suborder

7

0

850

U56638.1

98.54

giant panda

family

7

0

850

GU931013.1

97.92

North American raccoon

suborder

10

0

833

GU931012.1

97.92

ring-tailed coati

suborder

10

0

833

GU167726.1

97.92

leopard seal

suborder

10

0

833

DQ660203.1

97.92

North American raccoon

suborder

10

0

833

DQ660202.1

97.92

crab-eating raccoon

suborder

10

0

833

GU174616.1

97.71

ringed seal

suborder

11

0

830

 MIS = mismatches (bases).  GAPS = gaps in alignment (bases).

Notice that the %ID correlates with the taxonomic relationship (same species, genus, family, or suborder) and that Melba’s statements based on scores are too broad and lack sufficient discrimination.  This is a game of single percentages or tenths of a percent.  The Ursidae (bears) clearly match the reference black bear sequence DQ240386.1 BEST.  Other matches, while good, are clearly not AS good.  Sample 26 has a lower score because the sequence length is shorter; apparently the entire gene (BDNF) was not sequenced in this sample.  

No, there is no “contamination bias” here.  In fact, the THE SAMPLE IS FROM A BEAR, as proven by two independent laboratory analyses (Huggins,and Sykes at right) and my interpretation of Ketchum's nDNA sequence in my paper 1,  a fact missed by the reviewer and by Melba, who said “…your example of bear contamination can be completely ruled out considering none of the laboratories handling the samples have bear samples,“ a red herring, or possibly worse: a misleading diversion.  And 97.71%ID is a genetic mile from 100%, another fact not recognized by Melba.  So concludes issues (2), (4), and (7), which are all related. 


Three more issues are highly significant:  (14) the mtDNA is “poorly" human.  This is what I found (Paper 2) for eight of the 18 samples for which a complete mitochondrial genome was determined.  Specifically, these samples contained too many extra mutations to fit neatly into the mtDNA Phylotree (see at right) for human haplogroups.  These eight samples each had less than a 1% chance of being in the human population based on their numbers of these extra mutations: 7-17.  The average for the haplogroup H1a is only 2.37.  Additionally, eight of the remaining 11 samples for which only the HV1 region was sequenced had one extra mutation each.  So what are they?  They’re still much closer to human than anything else (>99.9%ID).   It cannot be determined whether they are of human origin but further evolved since a hybridization event according to the Ketchum theory, or they simply contain sequencing errors due to contamination.  In any case it’s incorrect to say that they are “completely human.”  At best, they might be human mutants.   


The best (by score) sequence match to Sample 31 is a fungus and the second best appeared to be a bacterium, so degradation as suggested in comment (15) is real.  Melba privately recognized this to me, yet publicly and in her responses to the reviewers she vigorously denies any degradation or contamination.  This is related to issue (16): the electron microscopy showing single-stranded DNA segments.  I asked the microscopist coauthor, Prof. Andreas Holzenburg of Texas A&M University, about possible explanations for this, and he responded that he only takes pictures and does not engage in the paranormal, which disappointed me (his name's on the paper).  Melba called the microscopy results “supporting evidence for the unusual behavior of the amplified DNA.”  But only viruses contain single-stranded DNA; it’s never been found in other organisms, where it would disrupt the known process of DNA replication.


In spite of Melba’s claims that standard forensic techniques were employed, issues (6) and (13) point out that there is nothing tying any of these samples to a specific identifiable animal.  No wonder the results are so mixed.  New species identification needs a holotype or very convincing photos or videos to accompany a DNA sample.  For example, Sample 26 was collected under two feet of snow weeks after the reported shooting of a sasquatch in the general vicinity.  Who can say that it was related to the shooting since no whole body was found, only this small patch of fur and tissue?

Issue (8) also merits discussion.  The three nuclear DNA sequences align as follows with one another:

S31 vs. S26: Query Cover (alignment) 8% of S31.

S140 vs. S26: Query Cover 62% of S140.

S31 vs. S140: Query Cover 8% of S31.

I pointed this out to Melba in April, 2013.  These cannot be the same species.  Sample 31 is especially far removed from the other two.  This made sense when I found out that S26 was a bear, S31 a human, and S140 a dog.   Notice that the bear and dog align better, being more closely related. 

I pointed out to Melba that hair diameter comparisons require statistical analysis (issue (9)).  One cannot simply compare two samples and say with certainty whether they belong to the same population.  A statistical distribution of the suspected population is needed, and preferably also of the target organism.  Any two hair samples may be outliers.  Standard statistical tests can determine the probability that a sample or collection of samples belongs to a specific population.  She didn’t do this. 

Lastly, issue (5) contamination is important, but hard to prove, especially when some samples were misidentified.  My third paper demonstrates multiple species in some samples.  What is the “origin” and what is “contamination” can be difficult to answer, but in any case no new species was proven to exist.  A number of specific gene sequences did not align with any species in several databases.  These may be examples of issue (15) unknown microorganisms and mispriming, as pointed out by one reviewer.
No wonder all six reviewers agreed on issue (1) Inadequately substantiated thesis.  The results do not support the conclusions. This is precisely what I found in my papers - flawed interpretation of results. 

Were these reviews fair and unbiased?  Based on the results in my three papers, they were on target, although having the DNA sequences available from the start would have made possible better substantiated reviewer criticism in some cases, especially for the Nature reviewers.  The paper didn’t provide the usual kinds of evidence for a new species, so the reviewers did not recommend publication in its final form.  In some cases Melba was able to satisfy relatively minor requests for semantic and other changes, but more importantly the DNA sequences when carefully interpreted do not point to a new species.  This does not disprove the existence of sasquatch, only that the Ketchum study does not prove its existence.

I leave you with the following quotation from Scott Carpenter (see his blog site at right):

“I make a challenge to any one that has access to the BLAST application. Take the supplemental attachments in Dr. Ketchum's study that contain the raw DNA sequence data, convert those PDF files into the correct format and then compare the sequences to the GenBank database and see if there are any matches. We already know the data is good, it has been produced by the University of Texas. Have the courage to run the BLAST search and publish the results! Because you know they will show no match, that the species represented in the data is a NEW species, a NEW homo sapiens species  a BIGFOOT..... “

Thanks, Scott, I did just that.  Sample 26 is a bear.  Sample 31 is human.  Sample 140 is a dog.  Nothing new here.  See my first paper.  Now, Scott, please take your own challenge and follow my procedures as outlined in my three part blog on using BLAST™.  Then show us YOUR results.  Until you do so, you have no credibility.  Walk the talk, please.  Your University of Texas experts blew it, big time.  Ask them to review my papers as I did theirs and tell us what they said.  I asked Melba to do this and she refused.  I sent coauthor Dr. Fan Zhang of the University of North Texas my work, requested a response, and received none.  These are not examples of the open mindedness and unbiasedness which Melba demands of others.   

And no, Scott and Melba, “Accept for publication with
revisions” which you refused to make does not "pass
peer review."