Basics

Tests for nominal variables

Descriptive statistics

Tests for one measurement variable

Tests for multiple measurement variables

Multiple tests

Miscellany

Guide to fairly good graphs with Excel


Drawing graphs is an important part of analyzing your data and presenting the results of your research. Here I describe the features of clear, effective graphs, and I outline techniques for generating graphs using Excel (there's a similar page on generating good graphs with Calc, part of the free OpenOffice.org suite of programs).

Some of the default conditions for Excel graphs are annoying, but with a little work, you can get it to produce graphs that are good enough for presentations and web pages.

General tips for all graphs

Choosing the right kind of graph

There are many kinds of graphs--bubble graphs, pie graphs, doughnut graphs, radar graphs--and each may be the best for some kinds of data. By far the most common graphs in scientific publications are scatter graphs and bar graphs.

A scatter graph (also known as an X-Y graph) is used for graphing data sets consisting of pairs of numbers. These could be measurement variables, or they could be nominal variables summarized as percentages. The independent variable is plotted on the X-axis (the horizontal axis), and the dependent variable is plotted on the Y-axis.

The independent variable is the one that you manipulate, and the dependent variable is the one that you observe. For example, you might manipulate salt content in the diet and observe the effect this has on blood pressure. Sometimes you don't really manipulate either variable, you observe them both. In that case, if you are testing the hypothesis that changes in one variable cause changes in the other, put the variable that you think causes the changes on the X-axis. For example, you might plot "height, in cm" on the X-axis and "number of head-bumps per week" on the Y-axis if you are investigating whether being tall causes people to bump their heads more often. Finally, there are times when there is no cause-and-effect relationship, in which case you can plot either variable on the X-axis; an example would be a graph showing the correlation between arm length and leg length.

There are a few situations where it is common to put the independent variable on the Y-axis. For example, in oceanography it is traditional to put "distance below the surface of the ocean" on the Y-axis, with the top of the ocean at the top of the graph, and the dependent variable (such as chlorophyll concentration, salinity, fish abundance, etc.) on the X-axis. Don't do this unless you're really sure that it's a strong tradition in your field.

A bar graph is used for plotting means or percentages for different values of a nominal variable, such as mean blood pressure for people on four different diets. Usually, the mean or percentage is on the Y-axis, and the different values of the nominal variable are on the X-axis, yielding vertical bars.

Sometimes it is not clear whether the variable on the X-axis is a measurement or nominal variable, and thus whether the graph should be a scatter graph or a bar graph. This is most common with measurements taken at different times. In this case, I think a good rule is that if you could have had additional data points in between the values on your X-axis, then you should use a scatter graph; if you couldn't have additional data points, a bar graph is appropriate. For example, if you sample the pollen content of the air on January 15, February 15, March 15, etc., you should use a scatter graph, with "day of the year" on the X-axis. Each point represents the pollen content on a single day, and you could have sampled on other days. When you look at the points for January 15 and February 15, you connect them with a line (even if there isn't a line on the graph, you mentally connect them), and that implies that on days in between January 15 and February 15, the pollen content was intermediate between the values on those days. However, if you sampled the pollen every day of the year and then calculated the mean pollen content for each month, you should plot a bar graph, with a separate bar for each month. This is because the mental connect-the-dots of a scatter graph of these data would imply that the months in between January and February would have intermediate pollen levels, and of course there are no months between January and February.

Drawing scatter graphs with Excel

  1. Put your independent variable in one column, with the dependent variable in the column to its right. You can have more than one dependent variable, each in its own column; each will be plotted with a different symbol. Note that if you have a blank cell or a cell containing anything that's not a number in the middle of a column, you'll get a line graph (which you don't want) instead of the scatter graph that you do want. This is one of the stupidest things about graphing with Excel. One way to get around this is to replace the blanks with "NA()".
  2. If you are plotting 95 percent confidence intervals, standard errors, or some other kind of error bar, put the values in the next column. These should be confidence intervals, not confidence limits; thus if your first data point has an X-value of 7 and a Y-value of 4 ±1.5, you'd have 7 in the first column, 4 in the second column, and 1.5 in the third column. For confidence limits that are asymmetrical, such as the confidence limits on a binomial percentage, you'll need two columns, one for the difference between the percentage and the lower confidence limit, and one for the difference between the percentage and the upper confidence limit.
    Spreadsheet for scatter graph
    An Excel spreadsheet set up for a scatter graph including confidence intervals.
  3. Select the cells that have the data in them. Don't select the cells that contain the confidence intervals.
  4. From the "Insert" menu, choose "Chart" (or click on the little picture of a graph in the task bar). Choose "XY (Scatter)" as your chart type. Do not choose "Line"; the little picture may look like a scatter graph, but it isn't.
  5. The next screen shows the "Data range," the cells that contain your data; you shouldn't need to change anything here.
  6. On the "Titles" tab of the "Chart Options" screen, enter titles for the X axis and Y axis, including the units. A chart title is essential for a graph used in a presentation, but optional in a graph used for a publication (since it will have a detailed caption).
  7. On the "Gridlines" tab of the "Chart Options" screen, get rid of the gridlines; they're ugly and unnecessary.
  8. On the "Legend" tab of the "Chart Options" screen, get rid of the legend if you only have one set of Y values. If you have more than one set of Y values, get rid of the legend if you're going to explain the different symbols in the figure caption; leave the legend on if you think that's the most effective way to explain the symbols.
  9. Click the "Finish" button, but you're far from done. Click on the white area outside the graph to select the whole image, then drag the sides or corners to make the graph the size you want.
  10. Click in the gray area inside the graph, choose "Selected Plot Area" from the "Format" menu, and then choose "None" under "Area." This will get rid of the ugly gray background. Under "Border," make the color of the border black instead of gray.
  11. Click on the Y-axis, choose "Selected Axis" from the "Format" menu, and make modifications to the tick marks, font and number format. Most publications recommend sans-serif fonts (such as Arial, Geneva, or Helvetica) for figures. On the "Font" tab, unselect "Auto scale," otherwise the font size will change when you change the graph size. On the "Scale" tab, set the minimum and maximum values of Y. The maximum should be a nice round number, somewhat larger than the highest point on the graph. If you're plotting a binomial percentage, don't make the Y-scale greater than 100 percent. If you're going to be adding error bars, the maximum Y should be high enough to include them. The minimum value on the Y scale should usually be zero, unless your observed values vary over a fairly narrow range. A good rule of thumb (that I just made up, so don't take it too seriously) is that if your maximum observed Y is more than twice as large as your minimum observed Y, your Y scale should go down to zero. If you're plotting multiple graphs of similar data, they should all have the same scales for easier comparison.
  12. Format your X-axis the same way you formatted your Y-axis.
  13. Click on the Y-axis title, choose "Selected Axis Title" from the "Format" menu, and adjust the font. Unselect "Auto scale" so it won't change the font size if you adjust the size of the graph. Do the same for the X-axis title.
  14. Pick one of the symbols, click on it, and choose "Selected Data Series" from the "Format" menu. On the "Patterns" tab, choose the symbol you want (choose "No Color" for the background to get a better view of the symbol choices). If you want to connect the dots with a line, choose that.
  15. If you want to add error bars, go to the "Y Error Bars" tab and choose "Both" under "Display." Ignore the buttons for "standard deviation" and "standard error," even if they sound like what you want. Instead, choose "Custom." Click on the icon next to the box labelled "+" and then select the cells in your spreadsheet containing the upper confidence interval. Do the same for the box labelled "-" and the lower confidence interval (you'll be selecting the same cells, unless you have asymmetrical confidence intervals).
    Steps 1, 2 and 3 of adding error bars
     
    Steps 1, 2 and 3 of adding error bars
     
    Steps 4 and 5 of adding error bars
    Adding error bars to a graph. Repeat steps 3, 4 and 5 for the box labelled "-".
    Steps 4 and 5 of adding error bars
    Adding error bars to a graph. Repeat steps 3, 4 and 5 for the box labelled "-".
  16. Repeat the above for each set of symbols.
  17. If you've added error bars, click on one of them and choose "Selected Error Bars" from the "Format" menu. On the "Patterns" tab, you can adjust the look of the error bars.
  18. If you want to add a regression line, click on one of the symbols and choose "Add Trendline" from the "Chart" menu. Choose which kind you want (choose "Linear" unless you really know what you're doing).
  19. Click in the graph area, outside the graph, to select the whole box that includes the graph and the labels. Choose "Selected Chart Area" from the "Format" menu. On the "Patterns" tab, you'll probably want to make the border be "None." On the "Properties" tab, choose "Don't move or size with cells," so the graph won't change size if you adjust the column widths of the spreadsheet.
  20. You should now have a beautiful, beautiful graph. You can click once on the graph area (in the blank area outside the actual graph), copy it, and paste it into a word processing document, graphics program or presentation.
    Graph of bird abundance
    The number of bird species observed in the Christmas Bird Count vs. latitude at seven locations in Delaware. Data points are the mean number of species for the counts in 2001 through 2006, with 95 percent confidence intervals.

Back-transformed axis labels

If you have transformed your data, don't plot the untransformed data; instead, plot the transformed data. For example, if your Y-variable ranges from 1 to 1000 and you've log-transformed it, you would plot the logs on the Y-axis, which would range from 0 to 3 (if you're using base-10 logs). If you square-root transformed those data, you'd plot the square roots, which would range from 1 to about 32. However, you should put the back-transformed numbers (1 to 1000, in this case) on the axes, to keep your readers from having to do squaring or exponentiation in their heads.

I've put together three spreadsheets with graphs that you can use as templates: a spreadsheet graph with log-transformed or square-root transformed X values, a spreadsheet graph with log-transformed or square-root transformed Y values, or a spreadsheet graph with log-transformed or square-root transformed X and Y values. While they're set up for log-transformed or square-root transformed data, it should be pretty obvious how to modify them for any other transformation.

Regression graph
Abundance of the longnose dace, in number of fish per 75 linear meters of stream, versus nitrate concentration. Fish abundance was square-root transformed for the linear regression.

Drawing bar graphs with Excel

  1. Put the values of the independent variable (the nominal variable) in one column, with the dependent variable in the column to its right. The first column will be used to label the bars or clusters of bars. You can have more than one dependent variable, each in its own column; each will be plotted with a different pattern of bar.
  2. If you are plotting 95 percent confidence intervals or some other kind of error bar, put the values in the next column. These should be confidence intervals, not confidence limits; thus if your first data point has an X-value of 7 and a Y-value of 4 ±1.5, you'd have 7 in the first column, 4 in the second column, and 1.5 in the third column. For confidence limits that are asymmetrical, such as the confidence limits on a binomial percentage, you'll need two columns, one for the difference between the percentage and the lower confidence limit, and one for the difference between the percentage and the upper confidence limit.
    Spreadsheet for bar graph
    An Excel spreadsheet set up for a bar graph including confidence intervals.
  3. Select the cells that have the data in them. Do include the first column, with the values of the nominal variable, but don't select cells that contain confidence intervals.
  4. From the "Insert" menu, choose "Chart" (or click on the little picture of a graph in the task bar). Choose "Column" as your chart type, and the picture of bars next to each other (not on top of each other) as the "Chart sub-type." Do not choose the three-dimensional bars, as they just add a bunch of clutter to your graph without conveying any additional information.
  5. The next screen shows the "Data range," the cells that contain your data; you shouldn't need to change anything here.
  6. On the "Titles" tab of the "Chart Options" screen, enter titles for the X-axis and Y-axis, including the units for the Y-axis. A chart title is essential for a graph used in a presentation, but optional in a graph used for a publication (since it will have a detailed caption). Because each bar or cluster of bars will be labelled on the X-axis, you may not need an X-axis title.
  7. On the "Gridlines" tab of the "Chart Options" screen, get rid of the gridlines; they're ugly and unnecessary.
  8. On the "Legend" tab of the "Chart Options" screen, get rid of the legend if you only have one set of Y values. If you have more than one set of Y values, get rid of the legend if you're going to explain the different bar patterns in the figure caption; leave the legend on if you think that's the most effective way to explain the bar patterns.
  9. Click the "Finish" button, but you're not done yet. Click on the white area outside the graph to select the whole image, then drag the sides or corners to make the graph the size you want.
  10. Click in the gray area inside the graph, choose "Selected Plot Area" from the "Format" menu, and then choose "None" under "Area." This will get rid of the ugly gray background. Under "Border," make the color of the border black instead of gray.
  11. Click on the Y-axis, choose "Selected Axis" from the "Format" menu, and make modifications to the tick marks, font, and number format. Most publications recommend sans-serif fonts (such as Arial, Geneva, or Helvetica) for figures. On the "Font" tab, unselect "Auto scale," otherwise the font size will change when you change the graph size. On the "Scale" tab, set the minimum and maximum values of Y. The maximum should be a nice round number, somewhat larger than the highest point on the graph. If you're plotting a binomial percentage, don't make the Y-scale greater than 100 percent. If you're going to be adding error bars, the maximum Y should be high enough to include them. The minimum value on the Y scale should usually be zero, unless your observed values vary over a fairly narrow range. A good rule of thumb (that I just made up, so don't take it too seriously) is that if your maximum observed Y is more than twice as large as your minimum observed Y, your Y scale should go down to zero. If you're plotting multiple graphs of similar data, they should all have the same scales for easier comparison.
  12. Format your X-axis the same way you formatted your Y-axis. It doesn't have a scale, of course. You may want to get rid of the tick marks, they don't really serve a purpose.
  13. Click on the Y-axis title, choose "Selected Axis Title" from the "Format" menu, and adjust the font. Unselect "Auto scale" so it won't change the font size if you adjust the size of the graph. Do the same for the X-axis title.
  14. Pick one of the bars, click on it, and choose "Selected Data Series" from the "Format" menu. On the "Patterns" tab, choose the color you want. Click on "Fill effects," then the "Pattern" tab to get halftone grays (little black dots), hatching, and other patterns that work well in black-and-white. On the "Options" tab, you can adjust the width of the bars by changing the "Gap width."
  15. If you want to add error bars, go to the "Y Error Bars" tab and choose "Both" under "Display." Ignore the buttons for "standard deviation" and "standard error," even if they sound like what you want. Instead, choose "Custom." Click on the icon next to the box labelled "+" and then select the cells in your spreadsheet containing the upper confidence interval. Do the same for the box labelled "-" and the lower confidence interval (you'll be selecting the same cells, unless you have asymmetrical confidence intervals).
  16. Repeat the above for each set of bars.
  17. If you've added error bars, click on one of them and choose "Selected Error Bars" from the "Format" menu. On the "Patterns" tab, you can adjust the look of the error bars.
  18. Click in the graph area, outside the graph, to select the whole box that includes the graph and the labels. Choose "Selected Chart Area" from the "Format" menu. On the "Patterns" tab, you'll probably want to make the border be "None." On the "Properties" tab, choose "Don't move or size with cells," so the graph won't change size if you adjust the column widths of the spreadsheet.
  19. You should now have a beautiful, beautiful graph. You can click once on the graph area (in the blank area outside the actual graph), copy it, and paste it into a word processing document, graphics program or presentation.
    Bar graph of bird abundance
    The number of bird species observed in the Christmas Bird Count at seven locations in Delaware. Data points are the mean number of species for the counts in 2001 through 2006, with 95 percent confidence intervals.

Back-transformed axis labels in bar graphs

If you have transformed your data, don't plot the untransformed data; instead, plot the transformed data. For example, if you've done an anova of log-transformed data, the bars should represent the means of the log-transformed values. Excel has an option to make the X-axis of a bar graph on a log scale, but it's pretty useless, as it only labels the tick marks at 1, 10, 100, 1000…. The only way I know of to get the labels at the right place is to format the axis to not have labels or tick marks, then use the drawing and text tools to put tick marks and labels at the right positions. Get the graph formatted and sized the way you want it, then put in dummy values for the first bar to help you position the tick marks. For example, if you've log-transformed the data and want to have 10, 20, 50, 100, 200, on the Y-axis, give the first bar a value of LOG(10), then use the drawing tools to draw a tick mark even with the top of the bar, then use the text tool to label it "10". Change the dummy value to LOG(20), draw another tick mark, and so on.

Exporting Excel graphs to other formats

Once you've produced a graph, you'll probably want to export it to another program. You may want to put the graph in a presentation (Powerpoint, Keynote, Impress, etc.) or a word processing document. You should be able to click in the graph to select the whole thing, copy it, then paste it into your presentation or word processing document. Sometimes, this will be good enough quality for your purposes.

You'll often want to put the graph in a graphics program, so you can refine the graphics in ways that aren't possible in Excel, or so you can export the graph as a separate graphics file. This is particularly important for publications, where you need each figure to be a separate graphics file in the format and high resolution demanded by the publisher. With earlier versions of Excel, this was easy to do; you could copy a graph, paste it into a drawing program, then "ungroup" the elements and change fonts, symbols, colors, and more. The stupidheads at Microsoft have made newer versions of Excel much worse, so it is much more difficult to make publication-quality graphs with Excel.

The reason it is hard to use Excel for publication-quality graphs is because it produces bitmap images with low (72 pixels per inch) resolution. This looks fine on a computer monitor, but lousy in print. For publications, you need either a bitmap image with much higher resolution (such as 600 dots per inch), or a vector image.

A vector image file stores the drawing as a set of descriptions of different elements. A black dot, 1 mm in diameter, would be described in a vector image file as "draw a circle with the center at 40 mm right and 50 mm up from the lower left corner of the drawing; make the circle have a diameter of 1 mm; and fill in the circle with black." Common examples of vector graphics formats include SVG (Scaleable Vector Graphics), ODG (OpenDocument Graphics), and WMF/EMF (Windows Metafile/Extended Metafile). Commercial drawing programs such as Adobe Illustrator and CorelDraw, and free programs such as Inkscape and OpenOffice.org Draw, work with vector graphics.

Bitmap image files store an image as a description of each pixel. A black dot might be described as "Put 7 white pixels, then a black pixel, then 7 white pixels; on the next line, put 6 white, 3 black, and 6 white pixels; ..." Common bitmap formats include JPEG, TIFF, GIF, and BMP; commercial programs such as Adobe Photoshop and free programs such as GIMP work with bitmap images.

Excel stores graphs in a vector format, but with the new "improved" versions of Excel, it isn't possible to copy the graph in vector format and paste it into a drawing program (at least it doesn't work for me—if you know more about it than I do, please e-mail me). It's also not possible to increase the resolution of the graph within Excel. With older versions of Excel, it was possible to copy a graph, paste it into a drawing program, ungroup the individual elements and have them in vector format. I don't know exactly when this ability was removed, but if you have an older version of Excel, try copying and pasting a graph from Excel into a drawing program, then zoom in and see if it is a vector or bitmap image. If it is a vector image, don't ever upgrade your version of Excel, as the more recent versions are worse than the one you have.

Letter e in bitmap and vector formats
A 12-point Helvetica letter "e," enlarged 8 times. The letter on the left is from an Excel graph, showing the poor resolution. The letter on the rigth is from a drawing program (Inkscape).
Letter e in bitmap and vector formats
A 12-point Helvetica letter "e," enlarged 8 times. The letter on the left is from an Excel graph, showing the poor resolution. The letter on the right is from a drawing program (Inkscape).

To make publication-quality graphs with Excel, you have the following options:



Return to the Biological Data Analysis syllabus

Return to John McDonald's home page

This page was last revised September 12, 2009. Its address is http://udel.edu/~mcdonald/statgraph.html. It may be cited as pp. 274-286 in: McDonald, J.H. 2009. Handbook of Biological Statistics (2nd ed.). Sparky House Publishing, Baltimore, Maryland.

©2009 by John H. McDonald. You can probably do what you want with this content; see the permissions page for details.