Due to a variety of genetic, developmental, and environmental factors, no two organisms are exactly alike. This means that when you design an experiment to try to see whether variable X causes a difference in variable Y, you should always ask yourself, is there some variable Z that could cause an apparent relationship between X and Y?
As an example of such a confounding variable, imagine that you want to compare the amount of insect damage on leaves of American elms (which are susceptible to Dutch elm disease) and Princeton elms, a strain of American elms that is resistant to Dutch elm disease. You find 20 American elms and 20 Princeton elms, pick 50 leaves from each, and measure the area of each leaf that was eaten by insects. Imagine that you find significantly more insect damage on the Princeton elms than on the American elms (I have no idea if this is true).
It could be that the genetic difference between the types of elm directly causes the difference in the amount of insect damage. However, there are likely to be some important confounding variables. For example, many American elms are many decades old, while the Princeton strain of elms was made commercially available only recently and so any Princeton elms you find are probably only a few years old. American elms are often treated with fungicide to prevent Dutch elm disease, while this wouldn't be necessary for Princeton elms. American elms in some settings (parks, streetsides, the few remaining in forests) may receive relatively little care, while Princeton elms are expensive and are likely planted by elm fanatics who take good care of them (fertilizing, watering, pruning, etc.). It is easy to imagine that any difference in insect damage between American and Princeton elms could be caused, not by the genetic differences between the strains, but by a confounding variable: age, fungicide treatment, fertilizer, water, pruning, or something else.
Designing an experiment to eliminate differences due to confounding variables is critically important. One way is to control all possible confounding variables. For example, you could plant a bunch of American elms and a bunch of Princeton elms, then give them all the same care (watering, fertilizing, pruning, fungicide treatment). This is possible for many variables in laboratory experiments on model organisms.
When it isn't practical to keep all the possible confounding variables constant, another solution is to statistically control for them. You could measure each confounding variable you could think of (age of the tree, height, sunlight exposure, soil chemistry, soil moisture, etc.) and use a multivariate statistical technique to separate the effects of the different variables. This is common in epidemiology, because carefully controlled experiments on humans are often impractical and sometimes unethical. However, the analysis, interpretation, and presentation of complicated multivariate analyses are not easy.
The third way to control confounding variables is to randomize them. For example, if you are planting a bunch of elm trees in a field and are carefully controlling fertilizer, water, pruning, etc., there may still be some confounding variables you haven't thought of. One side of the field might be closer to a forest and therefore be exposed to more herbivorous insects. Or parts of the field might have slightly different soil chemistry, or drier soil, or be closer to a fence that insect-eating birds like to perch on. To control for these variables, you should mix the American and Princeton elms throughout the field, rather than planting all the American elms on one side and all the Princeton elms on the other. There would still be variation among individuals in your unseen confounding variables, but because it was randomized, it would not cause a consistent difference between American and Princeton elms.
An important aspect of randomizing possible confounding variables is taking random samples of a population. "Population," in the statistical sense, is different from a biological population of individuals; it represents all the possible measurements of a particular variable. For example, if you are measuring the fluorescence of a pH-sensitive dye inside a kidney cell, the "population" could be the fluorescence at all possible points inside that cell. Depending on your experimental design, the population could also be the fluorescence at all points inside all of the cells of one kidney, or even the fluorescence at all points inside all of the cells of all of the kidneys of that species of animal.
A random sample is one in which all members of a population have an equal probability of being sampled. If you're measuring fluorescence inside kidney cells, this means that all points inside a cell, and all the cells in a kidney, and all the kidneys in all the individuals of a species, would have an equal chance of being sampled.
A perfectly random sample of observations is difficult to collect, and you need to think about how this might affect your results. Let's say you've used a confocal microscope to take a two-dimensional "optical slice" of a kidney cell. It would be easy to use a random-number generator on a computer to pick out some random pixels in the image, and you could then use the fluorescence in those pixels as your sample. However, if your slice was near the cell membrane, your "random" sample would not include any points deep inside the cell. If your slice was right through the middle of the cell, however, points deep inside the cell would be over-represented in your sample. You might get a fancier microscope, so you could look at a random sample of the "voxels" (three-dimensional pixels) throughout the volume of the cell. But what would you do about voxels right at the surface of the cell? Including them in your sample would be a mistake, because they might include some of the cell membrane and extracellular space, but excluding them would mean that points near the cell membrane are under-represented in your sample.
As another example, let's say you want to estimate the amount of physical activity the average University of Delaware undergrad gets. You plan to attach pedometers to 50 students and count how many steps each student takes during a week. If you stand on a sidewalk and recruit students, one confounding variable would be where the sidewalk is. If it's on North College Avenue, the primary route between the main campus and the remote Christiana Towers dorms, your sample will include students who do more walking than students who live closer to campus. Recruiting volunteers on a sidewalk near a student parking lot, a bus stop, or the student health center could get you more sedentary students. It would be better to pick students at random from the student directory and ask them to volunteer for your study. However, motivation to participate would be a difficult confounding variable to randomize; I'll bet that particularly active students who were proud of their excellent physical condition would be more likely to volunteer for your study than would students who spend all their time looking at great musicians on MySpace and searching YouTube for videos of cats. To get a truly random sample, you'd like to be able to make everyone you chose randomly participate in your study, but they're people, so you can't. Designing a perfectly controlled experiment involving people can be very difficult. Maybe you could put pedometers on cats, instead--that would be pretty funny looking.
This page was last revised August 20, 2009. Its address is http://udel.edu/~mcdonald/statsampling.html. It may be cited as pp. 21-23 in:
McDonald, J.H. 2009. Handbook of Biological Statistics (2nd ed.). Sparky House Publishing, Baltimore, Maryland.
©2009 by John H. McDonald. You can probably do what you want with this content; see the permissions page for details.