I hope that you are enjoying this exercise. Your host has been the National Center for
Bioinformation (NCBI), a division of the National Library of Medicine (NLM) of
the National Institutes of Health (NIH). Researchers from around the world submit
their DOCUMENTED DNA samples from CONTROLLED EXPERIMENTS to the databases in
this library, for all of us to search. And
just when you may have thought that your federal government never does anything
FOR YOU. I wonder why anybody would
think that.
I have maintained all along that Ketchum et al. did not search other databases
than the “Nucleotide Collection.” If I’m wrong she should present some dated
output proving me so; I'd really like to see what they found. We searched the Nucleotide
Collection in Part I. I believe that is
all Melba ever did. In order to
construct an all-inclusive table of BEST matches like my Table 1 in my first
paper, one needs to do multiple searches.
Additionally, whereas we found the polar bear in our first search, no
polar bear data was in the Nucleotide Collection when I did my initial work and
certainly not when Melba did hers prior to mine. Let’s do another search. If you are still logged on from Part I, go directly
to your BLAST™ input page, and skip to
step 2. below.
1. You can
omit steps 1. and 2. of Part I, now that you have a FASTA file of the S26
sequence, saved where you can find it.
Proceed immediately through steps 3.to 4a.
2. For 4b.
enter a “Job Title” as “S26 vs.
reference genomic sequence.”
3. Now on the
“Database” dropdown menu select Reference
genomic sequences (ref_seq genomic).
This is a database of COMPLETE nuclear genomes. It has far fewer species than the Nucleotide
Collection, but, importantly, it has more data on any given species. This will prove to be critical to finding the
best match for S26. We didn’t search it
first in Part I because the number of species it contains is VERY much less
than the Nucleotide Collection.
4. Now in
the “Organism” field type Ursidae and
when it comes up click on it. The “taxid:
9632” locates all bears in the “Taxonomy” database mentioned in Part I. This will limit our search to bears only,
which will also greatly reduce search time.
5. Then complete steps 4.d through 4.f in Part I,
except select 500 for “Max target
sequences.” We won’t need as many output
sequences this time, because we are only searching for bears. ”Word size” 64 is still important.
6. The BLAST™
results screen will eventually open up before your eyes. This time notice only two species of bear on
the hit list, the polar bear (Ursus
maritimus) and the giant panda (Ailuropoda
melanoleuca). The black bear (Ursus americanus) has not yet shown
up. Glance at the “Ident” column to see
that these are VERY GOOD matches to our S26 sequence: in retrospect, genus
level match for the polar bear and family level match for the panda.
7. Now let’s
take a little break from searches to investigate the taxonomy/phylogeny of
bears. Do not delete your hit results
page; we’ll come back to it in a moment.
Open the NCBI webpage: http://www.ncbi.nlm.nih.gov/ and click “Taxonomy” on the left side under “Databases”. In the new page click Browser under “Taxonomy Tools.” Now in the new input page go to “Search for”
in the upper left and enter bears.
8. A
phylogenetic tree for bears is seen. Use
a little imagination:, branches to other species are off the page to the left;
each line is a twig; indentations indicate levels of the branches and twigs. Under
the family Ursidae, indented one
level are the several genuses; and then under each genus indented another level, are the species;
finally, indented one more level are subspecies. The key take away points are: 1) the giant
panda is in a genus by itself (Ailuropoda),
2) the black bear and the polar bear are in the same genus, Ursus, and are therefore more closely
related than either is to the panda.
Keep this in mind as we proceed to examine our new search results.
9. Back to
the BLAST™ results page. Follow steps
7.a. to 7.e from Part I. You may want to
copy the column headings from your first Excel file in Part I. Let’s look at this file. Also, please open up the Excel file from the
Nucleotide Collection search in Part I for comparison. We have a new champion “best of show” match
(highest %ID and score) at the top of the new Excel file. Check
by clicking the accession number in the BLAST™ output to see that it’s a polar bear.
10. Let’s
focus on three lines of data: The first
line (Pacific walrus) in the Nucleotide Collection from the Excel results file
(Part I) and the first two lines (a polar bear, and a panda) of the new
Reference Genomic Sequence Excel results file produced above. A summary of the important data follows,
abbreviated from these two source files (four columns were omitted as presently
irrelevant). I added “Species” for your convenience.
Accession
|
Species
|
%ID
|
LENGTH
|
MIS.
|
GAPS
|
Q-Start
|
Q-stop
|
SCORE
|
Nucleotide
Collection
|
||||||||
XM_004394587.1
|
PW
|
96.58
|
2136
|
45
|
2
|
189026
|
191136
|
3515
|
Reference
Genomic Sequences
|
||||||||
NW_007929448.1
|
PB
|
98.83
|
2139
|
0
|
1
|
189028
|
191141
|
3788
|
NW_003218202.1
|
GP
|
97.15
|
2141
|
33
|
2
|
189026
|
191141
|
3591
|
PW = Pacific walrus, PB = polar bear,
GP = Giant panda
|
Can you see that over this query sequence range
(Q-start to Q-stop) the order of match is (best to worst):
1. Polar
Bear: best match, highest %ID, fewest mismatches, fewest gaps, and highest
score.
2. Giant Panda:
next best, intermediate.
3. Pacific walrus:
worst match, lowest %ID, most mismatching bases, tied for most gaps, lowest
score?
Notice how close these %IDs and scores are, yet who
would confuse a walrus, a polar bear and a panda by sight alone? This is because of conservation of genes
(those invisible little segments of
your chromosomes in the nucleus of all your cells), namely that important genes
are passed down through evolution from ancestor to progeny with minimal
mutations. This is why I invented the
concept of moments to compare matches. See
my first paper. It uses %ID - 95: i.e. 1.58, 3.83, and 2.15, respectively, top
to bottom above. These numbers are much
more different in relative magnitude than are 96.58, 98.98, and 97.15,
respectively. Important phylogeny is not
as likely to be lost in rounding off or “eyeballing” numbers.
Next time in Part III we’ll look for the black bear
(drilling down) and discover why it’s
still hibernating from us as well as look harder for Homo sapiens, other primates, and all other animals (stepping back). We want
to get this right before telling Melba.
As it is, she’s going to be very upset with us. She may even unfriend us in FB and call our
work “nastiness.”
Save your BLAST™ results page as a Webpage (*.html)
file. Really enterprising students may
wish to attempt the above searches with Samples 31 and 140, downloading their
sequences from Melba’s Sasquatch Genome Project website (on the right). But, not to worry, I’ll give you some hints
later if you don’t feel up to this just yet.
You’ve been a really good class. (The clapping and cheering is deafening, even
on the Internet.)
P.S. Has anyone
found a lemur yet? Hope not.
No comments:
Post a Comment