If you are in plan B, remove the adult flies from the vial you set up on Thursday, and count them.
Work on you individual project.
BLAST the D. melanogaster protein sequence you've been working on for the last couple weeks against the genome of Drosophila yakuba. This is a species that is close enough to D. melanogaster that the synonymous sites are not saturated, but distant enough that there will probably be some protein sites that are different. Save the protein sequence in FASTA format (if you have taxon set D or E, you've already done this). Then change the format back to GenPept and scroll down until you see something like this:
Site order(358,362)
/site_type="active"
/db_xref="CDD:88411"
CDS 1..558
/gene="Pgi"
/coded_by="join(L27683.1:159..292,L27683.1:544..1126,
L27683.1:1497..1736,L27683.1:1795..2323,
L27683.1:2382..2572)"
Click on the link for CDS. This will give you the DNA sequence for the coding region of the gene, the part that actually codes for amino acids. Change its format to FASTA and copy it into a file. Get the DNA sequence for the coding region of your gene in D. melanogaster, as well (it will probably be easiest to just BLAST your saved D. melanogaster protein sequence against the D. melanogaster genome).
Align the D. melanogaster and D. yakuba protein sequences using ClustalW. Count the number of amino acid differences between the two sequences. Then align the DNA sequences and count the number of nucleotide differences between the two sequences, then subtract the number of amino acid differences; this is your estimate of the number of synonymous differences. In most genes, there are about 3 times as many amino acid replacement sites as there are synonymous sites. Multiply the total length of the coding sequence (in nucleotides) times 0.75 to get a rough estimate of the number of replacement sites, and multiply the total length times 0.25 to get a rough estimate of the number of synonymous sites. Divide the number of amino acid differences by the number of replacement sites; this is Ka (also known as dN). Divide the number of synonymous differences by the number of synonymous sites; this is Ks (also known as dS). Then calculate Ka divided by Ks; this is the Ka/Ks ratio. If all the mutations in a gene are neutral, Ka/Ks should be about 1. If some mutations are neutral but some amino acid replacement mutations are harmful, Ka/Ks should be less than 1 (this is the most common observation). A Ka/Ks value greater than 1 suggests that positive selection is causing the protein sequences to be different.
Your calculation of Ka/Ks above was pretty crude; it ignored the possibility that some sites would have more than one mutation, and it didn't actually count the number of synonymous and replacement sites. Download the free program DNAsp and install it on your computer. Use it to calculate Ka/Ks for your two sequences, and see whether it is significantly greater than 1.
Also use DNAsp to calculate the Codon Bias Index for your gene in D. melanogaster. Calculate the CBI for D. yakuba, as well. Save the Codon Usage Table for your gene in each species.
Return to the Genetics Lab syllabus
Return to John McDonald's home page
This page was last revised October 6, 2010. Its URL is http://udel.edu/~mcdonald/geneticslab11.html