# Biological Data Analysis: Homework 1

## Due Tuesday, Sept. 4

You must type this and all other homework assignments. Do not e-mail the assignment to me; turn it in early (at 322 Wolf) for a foreseeable absence, or turn it in late after an unexpected absence from class.

1. Choose an article from the lab you're in (if you're in a lab) or from your favorite scientific journal. It should be a regular-sized article (not a brief note) in a specialized journal (not Science, Nature, or PNAS). If you don't have a lab or a favorite journal, I recommend Evolution. Make sure the article has some kind of statistical test in it. Read through it and identify at least six variables that are analyzed in the paper. For each variable, provide the name of the variable (such as "LAM"), and if it's not obvious from the name, give a short explanation of what the variable is (such as "length of the anterior adductor muscle scar on a mussel shell"). Then say whether the variable is a measurement variable, a nominal variable, or a ranked variable. If a measurement variable has been converted to a nominal variable, or if the percentages from a nominal variable have been analyzed as if they were a measurement variable, mention this. You must have at least six variables; if you don't have six, do more than one paper.

2. Next, find one statistical test that is used in the paper. It may be difficult at this point in the class, but try to anwer the following:

• What biological question is being answered?
• What are the biological null and alternative hypotheses?
• What are the statistical null and alternative hypotheses?
• What variables are being analyzed in this particular test?
• What is the name of the statistical test that is being used?
• What probability (P-value) results from the test?

3. Give the citation information (authors, year, article title, journal, volume, page numbers) for the article or articles you've used.

Mmmmm, bonus: Everyone who finds an article with a true ranked variable (not a measurement variable converted to a ranked variable for a non-parametric test) gets a donut at the next class.

4. A broken collarbone is a common injury among bicycle racers. With the help of messages on various websites that are popular among professional and serious amateur bicycle racers, you identify 122 bicyclists who have broken a collarbone twice. Under the null hypothesis that the right and left collarbones break equally as often, and the first and second breaks are independent of each other, how many of the bicyclists would you expect to have broken the same collarbone twice?

The probability that the first break is the right collarbone is 1/2. The probability that the first AND second breaks are the right collarbone is 1/2 times 1/2, which equals 1/4. The same logic applies to breaking the left collarbone and then the left collarbone again. The probability of breaking right and right, OR breaking left and left, is 1/4 plus 1/4, which equals 1/2.

5. A calico cat has patches of black, orange, and white fur. The white fur is caused by the dominant S allele at the spotting locus. The combination of orange and black occurs in cats that are heterozygous (one O allele and one o allele) at the orange locus. The orange locus is on the X chromosome, so only female cats can be calico. A calico cat with the Ss genotype at the spotting locus mates with an all-black cat and gives birth to one kitten. What is the probability that it is calico?

The mother is Oo (she has orange and black) and Ss (the question gives her genotype). The father is ss (all black, in other words no white patches) and oy (one black allele on the X chromosome, plus a y chromosome). The probability that the kitten is female is 1/2. The probability that the kitten gets the S allele from its Ss mother, necessary to give it white patches, is 1/2. The probability that the kitten gets the O allele from its Oo mother, necessary to give it orange patches, is 1/2. The probability that the kitten is female AND has white patches AND has orange patches is 1/2 times 1/2 times 1/2, which equals 1/8.

6. On Mythbusters, Jamie and Adam wanted to know whether it was true that toast always falls buttered-side down when you drop it on the floor. They dropped 48 pieces of toast from the roof of their building; 29 landed buttered-side up, and 19 landed buttered-side down. Do the exact binomial test on these data using the spreadsheet linked from the Handbook of Biological Statistics, under "how to do the test." Report the P-value. Then do the test using Richard Lowry's web page linked there, and report the P-value.

The P-value is 0.193.

7. Write a sentence or two explaining what the P-value you got in question 1 means.

This means that if the null hypothesis of 50% butter-side-up is true, you will get 29 or more butter-side-up or 29 or more butter-side down 19.3% of the time.

8. Flip a coin 20 times, and record the number of heads and tails. Report your results, in order: for example, HHTHTTHTHHHHTTHTHHTT. Do the exact binomial test, using whichever method you prefer, and report the P-value.

Here's a classic probability puzzle. Don't spend a lot of time on it, and don't worry if you don't get the right answer; you'll get full credit as long as you write something down. If you do get the right answer, let me know whether it's because you've seen the puzzle before, or are just a genius.

9. There are 40 students in the class. Estimate the probability that there is at least one pair of "birthday twins" in the class--people born on the same date (not necessarily the same year). If you got an answer, what simplifying assumptions did you make? How do you think your answer would change if you used a more accurate model?

Imagine lining up all the students, starting at one end, and asking everyone their birthday. The probability that the second person has a different birthday from the first is 364/365. The probability that the third person has a different birthday from the first two is 363/365. The probability that the second person is different from the first, AND the third person is different from the first two, is 364/365 times 363/365. Continue in this fashion: 364/365 times 363/365 times 362/365 ... 326/365. The result is 0.109. This is the probabilty that all 40 birthdays are different, in other words there are NO birthday twins. The probability that there is at least one pair of birthday twins is therefore 1-0.109=0.891.

This ignores leap years; the possibility of February 29 birthdays would slightly reduce the probability of finding birthday twins. This also assumes that all birthdays are equally likely, when in reality, more children are born in some times of the year than others; in the U.S., August has the most births per day, while April has the least. Clustering of births would increase the probability of finding birthday twins.

Here are four practice questions for the final exam. For each experiment, list the variables that are mentioned in the description, and say whether each is a nominal, measurement, or ranked variable. Don't list variables that are not mentioned in the description; for example, don't list "weight of mice" for the first experiment.

10. You want to know whether food coloring affects the activity of mice. You feed 12 mice Purina Mouse Chow and 15 other mice Purina Mouse Chow with 10 mg yellow food coloring/kg added. You record how many hours a day each mouse spends running on its exercise wheel. You do this on each of 21 days.

food coloring present or absent: nominal
hours a day running: measurement
days since the start of the experiment: measurement
name of each mouse: nominal (this one's kind of subtle; we'll discuss why this is a variable later in the semester)

11. You want to know what kind of cat people like the best. You go to an animal shelter and take a picture of each cat, which you clip to its cage. When a cat gets adopted, the staff of the shelter pins the photo to a bulletin board, starting at the upper left and working their way down. Once all the cats are adopted, you record the position of each photo and the pattern of the cat: solid orange, orange and white, solid black, black and white, calico, etc.

pattern of cat: nominal

12. You're trying to figure out what trees squirrels like the best. You go to White Clay Creek State Park and find an area with a mix of oak, maple, sycamore, redbud, and dogwood trees. You randomly pick 20 of each species of tree, measure the height and diameter of each tree, and count the number of squirrel nests in each tree.

type of tree: nominal
height: measurement
diameter: measurement
number of squirrel nests in each tree: measurement

13. You want to know what percentage of college students in different majors accept the evidence for evolution. You survey 1,372 undergraduates and ask each student their major, what year they are, their gender, their GPA, how many biology classes they've taken, and whether they agree or disagree with the statement, "All living organisms on earth, including humans, have evolved over billions of years from earlier life by natural processes."

major: nominal
year: either nominal (frosh, sophomore, junior, senior) or measurement (number of years since starting college); this is one of those gray area variables
gender: nominal
GPA: measurement
number of biology classes taken: measurement
agree or disagree: nominal