Biological Data Analysis: Homework 1

Due Tuesday, Sept. 8



You must type this and all other homework assignments. Do not e-mail the assignment to me; turn it in early (at 322 Wolf) for a foreseeable absence, or turn it in late after an unexpected absence from class.

1. Choose an article from your favorite scientific journal. It should be a regular-sized article (not a brief note) in a specialized journal (not Science, Nature, or PNAS). If you don't have a favorite journal, I recommend Evolution. Make sure it has some kind of statistical test in it. Read through it and identify at least six variables that are analyzed in the paper. For each variable, provide the name of the variable (such as "LAM"), and if it's not obvious from the name, give a short explanation of what the variable is (such as "length of the anterior adductor muscle scar on a mussel shell"). Then say whether the variable is a measurement variable, an attribute variable, or a ranked variable. If a measurement variable has been converted to an attribute variable, or if the percentages from an attribute variable have been analyzed as if they were a measurement variable, mention this. You must have at least six variables; if you don't have six, do more than one paper.

2. Next, find one statistical test that is used in the paper. It may be difficult after only one week of class, but try to anwer the following:

3. Give the citation information (authors, year, article title, journal, volume, page numbers) for the article or articles you've used.

Mmmmm, bonus: Everyone who finds an article with a true ranked variable (not a measurement variable converted to a ranked variable for a non-parametric test) gets a donut at the next class.


4. A broken collarbone is a common injury among bicycle racers. With the help of messages on various websites that are popular among professional and serious amateur bicycle racers, you identify 122 bicyclists who have broken a collarbone twice. Under the null hypothesis that the right and left collarbones break equally as often, and the first and second breaks are independent of each other, how many of the bicyclists would you expect to have broken the same collarbone twice?

The probability of breaking the left collarbone is 0.5, so the probability of breaking the left collarbone AND the left collarbone is 0.5×0.5=0.25. The same is true for the right. The probability of breaking left+left OR right+right is 0.25+0.25=0.5.

5. A calico cat has patches of black, orange, and white fur. The white fur is caused by the dominant S allele at the spotting locus. The combination of orange and black occurs in cats that are heterozygous (one O allele and one o allele) at the orange locus. The orange locus is on the X chromosome, so only female cats can be calico. A calico cat with the Ss genotype at the spotting locus mates with an all-black cat and gives birth to one kitten. What is the probability that it is calico?

In order to be a calico, the kitten has to be female (probability 0.5), get the S allele from its Ss mother (probability 0.5), and get o allele from its Oo mother (probability 0.5). The probability of being female AND getting the S AND getting the o is 0.5×0.5×0.5=0.125.


Here's a couple of classic probability puzzles. Don't spend a lot of time on them, and don't worry if you don't get the right answer; you'll get full credit as long as you write something down. If you do get the right answer, let me know whether it's because you've seen the puzzles before, or are just a genius.

6. There are 57 students in the class. Estimate the probability that there is at least one pair of "birthday twins" in the class--people born on the same date (not necessarily the same year). If you got an answer, what simplifying assumptions did you make? How do you think your answer would change if you used a more accurate model?

The simplifying assumptions are that there are 365 days in the year (ignore leap years), and that each day has the same number of births (ignoring seasonal variation in birth rate). With these assumptions, pick one person in the class; they have a birthday. The probability that the second person you look at has a different birthday is 364/365. If that's the case, two birthdays are now taken. The probability that the third person you look at has a different birthday is 363/365. The probability that the fourth person has a different birthday from the first three is 362/365, etc. The probabilty that the second birthday is different from the first, AND the third birthday is different from the first two, AND the fourth birthday is different from the first three, etc. is 364/365*363/365*362/365*...*(365-57+1)/365. This is about 0.01, so there is about a 1% probability that all 57 birthdays are NOT the same. Therefore there is a 99% probability that some of the birthdays are the same; in other words, that there's at least one pair of birthday twins. Leap years would slightly decrease this, while seasonal variation in birth rates would increase the probability of birthday twins.

7. You are on a game show where you are told to pick one of three doors. Two of the doors have goats (Capra hircus) behind them, and one has a brand new, biodiesel-powered car. You pick door #1. In order to prolong the suspense, the host of the show (who knows where the car is) always opens one of the doors with a goat. In this case, he opens door #3, revealing a goat, and asks you whether you want to stick with door #1 or switch to door #2. What should you do? What is the probability of getting a car if you stick with door #1?

The probability that the car is behind door #1 is one/third, and there is a two/thirds probability that it's behind doors 2 or 3. If it's behind door 2, the host will reveal the goat behind door 3, so switching will get you the car. If it's behind door #3, the host will reveal the goat behind door #2, and switching will get you the car. So if you switch, you increase you chances of getting the car from 1/3 to 2/3.


Here are four practice questions for the final exam. For each experiment, list the variables that are mentioned in the description, and say whether each is a nominal, measurement, or ranked variable. Don't list variables that are not mentioned in the description; for example, don't list "weight of mice" for the first experiment.

8. You want to know whether food coloring affects the activity of mice. You feed 12 mice Purina Mouse Chow and 15 other mice Purina Mouse Chow with 10 mg yellow food coloring/kg added. You record how many hours a day each mouse spends running on its exercise wheel.

Food with or without yellow food coloring: nominal. Hours a day running: measurement.

9. You want to know what kind of cat people like the best. You go to an animal shelter and take a picture of each cat, which you clip to its cage. When a cat gets adopted, the staff of the shelter pins the photo to a bulletin board, starting at the upper left and working their way down. Once all the cats are adopted, you record the position of each photo and the pattern of the cat: solid orange, orange and white, solid black, black and white, calico, etc.

Order in which cats are adopted: ranked. Pattern of cat: nominal.

10. You're trying to figure out what trees squirrels like the best. You go to White Clay Creek State Park and find an area with a mix of oak, maple, sycamore, redbud, and dogwood trees. You measure the height and diameter of each tree. You put a radio collar on a squirrel, then record the amount of time it spends in each tree. You do this for six squirrels.

Height of tree: measurement. Diameter of tree: measurement. Species of tree: it wasn't clear that you recorded this, but if you did, it's nominal. Amount of time in tree: measurement.

11. You want to know what percentage of college students in different majors accept the evidence for evolution. You survey 1,372 undergraduates and ask each student their major, what year they are, their gender, their GPA, how many biology classes they've taken, and whether they agree or disagree with the statement, "All living organisms on earth, including humans, have evolved over billions of years from earlier life by natural processes."

Major: nominal. Year: nominal (or maybe measurement--this is in that gray area that isn't supposed to be on exam questions). Gender: nominal. GPA: measurement. Number of biology classes taken: measurement. Agree or disagree with question: nominal.



Return to the Biological Data Analysis syllabus

Return to John McDonald's home page

This page was last revised August 3, 2009. Its URL is http://udel.edu/~mcdonald/stathw1.html