# Guide to good graphs with Calc

Drawing graphs is an important part of presenting the results of your research. Here I describe the features of clear, effective graphs, and I outline techniques for generating good graphs using Calc, part of the free OpenOffice.org suite of programs. (I've also got a page on good graphs with Excel). Calc can produce graphs suitable for presentations and publication; the biggest deficiency is a very limited selection of symbols. Make sure you're using the latest version of OpenOffice.org, as earlier versions of Calc did not have a way to add error bars to a graph.

### General tips for all graphs

- Don't clutter up your graph with unnecessary junk. Grid lines, background patterns, 3-D effects, unnecessary legends, excessive tick marks, etc. all distract from the message of your graph.
- Do include all necessary information. Both axes of your graph should be clearly labelled, including measurement units if appropriate. Symbols and patterns should be identified in a legend on the graph, or in the caption. If the graph has "error bars," the caption should explain whether they're 95 percent confidence interval, standard error, standard deviation, or something else.
- Don't use color in graphs for publication. If your paper is a success, many people will be reading photocopies or will print it on a black-and-white printer. If the caption of a graph says "Red bars are mean HDL levels for patients taking 2000 mg niacin/day, while blue bars are patients taking the placebo," some of your readers will just see gray bars and will be confused and angry. For bars, use solid black, empty, gray, cross-hatching, vertical stripes, horizontal stripes, etc. Don't use different shades of gray, they may be hard to distinguish in photocopies. There are enough different symbols that you shouldn't need to use colors.
- Do use color in graphs for presentations. It's pretty, and it makes it easier to distinguish different categories of bars or symbols. But don't use red type on a blue background (or vice-versa), as the eye has a hard time focusing on both colors at once and it creates a distracting 3-D effect. And don't use both red and green bars or symbols on the same graph; from 5 to 10 percent of the males in your audience (and less than 1 percent of the females) have red-green colorblindness and can't distinguish red from green.

### Choosing the right kind of graph

There are many kinds of graphs--bubble graphs, pie graphs, doughnut graphs, radar graphs--and each may be the best for some kinds of data. By far the most common graphs in scientific publications are scatter graphs and bar graphs.

A **scatter graph** (also known as an X-Y graph) is used for graphing data sets consisting of pairs of numbers. These could be measurement variables, or they could be nominal variables summarized as percentages. The independent variable is plotted on the x-axis (the horizontal axis), and the dependent variable is plotted on the y-axis.

The independent variable is the one that you manipulate, and the dependent variable is the one that you observe. For example, you might manipulate salt content in the diet and observe the effect this has on blood pressure. Sometimes you don't really manipulate either variable, you observe them both. In that case, if you are testing the hypothesis that changes in one variable cause changes in the other, put the variable that you think causes the changes on the x-axis. For example, you might plot "height, in cm" on the x-axis and "number of head-bumps per week" on the y-axis if you are investigating whether being tall causes people to bump their heads more often. Finally, there are times when there is no cause-and-effect relationship, in which case you can plot either variable on the x-axis; an example would be a graph showing the correlation between arm length and leg length.

There are a few situations where it makes sense to put the independent variable on the Y-axis. For example, in oceanography it is traditional to put "distance below the surface of the ocean" on the Y-axis, with the top of the ocean at the top of the graph, and the dependent variable (such as chlorophyll concentration, salinity, fish abundance, etc.) on the X-axis. Don't do this unless you're really sure that it's a strong tradition in your field.

A **bar graph** is used for plotting means or percentages for different values of a nominal variable, such as mean blood pressure for people on four different diets. Usually, the mean or percentage is on the Y-axis, and the different values of the nominal variable are on the X-axis, yielding vertical bars.

Sometimes it is not clear whether the variable on the x-axis is a measurement or nominal variable, and thus whether the graph should be a scattergraph or a bar graph. This is most common with measurements taken at different times. In this case, I think a good rule is that if you could have had additional data points in between the values on your x-axis, then you should use a scatter graph; if you couldn't have additional data points, a bar graph is appropriate. For example, if you sample the pollen content of the air on January 15, February 15, March 15, etc., you should use a scatter graph, with "day of the year" on the x-axis. Each point represents the pollen content on a single day, and you could have sampled on other days. When you look at the points for January 15 and February 15, you connect them with a line (even if there isn't a line on the graph, you mentally connect them), and that implies that on days in between January 15 and February 15, the pollen content was intermediate between the values on those days. However, if you sampled the pollen every day of the year and then calculated the mean pollen content for each month, you should plot a bar graph, with a separate bar for each month. This is because the mental connect-the-dots of a scatter graph of these data would imply that the months in between January and February would have intermediate pollen levels, and of course there are no months between January and February.

### Drawing scatter graphs with Calc

- Put your independent variable in one column, with the dependent variable in the column to its right. You can have more than one dependent variable, each in its own column; each will be plotted with a different symbol.
- If you are plotting 95 percent confidence intervals, standard error, or some other kind of error bar, put the values in the next column. These should be confidence intervals, not confidence limits; thus if your first data point has an X-value of 7 and a Y-value of 4±1.5, you'd have 7 in the first column, 4 in the second column, and 1.5 in the third column. For confidence limits that are asymmetrical, such as the confidence limits on a binomial percentage, you'll need two columns, one for the difference between the percentage and the lower confidence limit, and one for the difference between the percentage and the upper confidence limit.
A Calc spreadsheet set up for a scatter graph including confidence intervals. - Select the cells that have the data in them. Don't select the cells that contain the confidence intervals.
- From the "Insert" menu, choose "Chart" (or click on the little picture of a graph in the task bar). Choose "XY (Scatter)" (the picture of a graph with dots on it) as your chart type. Do
*not*choose "Line"; the little picture with lines may look like an XY graph, but it isn't. - Click the "Next" button a couple of times. On the "Chart Elements" screen, enter titles for the X axis and Y axis, including the units. A chart title is essential for a graph used in a presentation, but optional in a graph used for a publication (since it will have a detailed caption). Get rid of the legend if you only have one set of Y values. If you have more than one set of Y values, get rid of the legend if you're going to explain the different symbols in the figure caption; leave the legend on if you think that's the most effective way to explain the symbols.
- Click the "Finish" button, but you're not done yet. Click on the white area outside the graph to select the whole image, then drag the sides or corners to make the graph the size you want.
- Choose "Chart Wall" from the "Format" menu, and then choose "White" on the "Area" tab. This will get rid of the ugly gray background. Under "Lines," make style "Continuous" and the color "Black," to give you a border around the graph.
- Choose "Axis" from the "Format" menu, then "Y axis", and make modifications to the tick marks, font and number format. Most publications recommend sans-serif fonts (such as Arial, Geneva, or Helvetica) for figures. On the "Scale" tab, set the minimum and maximum values of Y. The maximum should be a nice round number, somewhat larger than the highest point on the graph. If you're plotting a binomial percentage, don't make the Y-scale greater than 100 percent. If you're adding error bars, the maximum Y should be high enough to include them. The minimum value on the Y scale should usually be zero, unless your observed values vary over a fairly narrow range. A good rule of thumb (that I just made up, so don't take it too seriously) is that if your maximum observed Y is more than twice as large as your minimum observed Y, your Y scale should go down to zero. If you're plotting multiple graphs of similar data, they should all have the same scales for easier comparison.
- Format your X-axis the same way you formatted your Y-axis.
- Choose "Title" from the "Format" menu, then "Y axis title", and adjust the font. Do the same for the X-axis title.
- Pick one of the symbols, click on it, and choose "Object properties" from the "Format" menu. On the "Line" tab, choose the kind of line you want connecting the points, if any. Then choose the symbol under "Icon." Unfortunately, Calc has a very limited number of symbols; there is no circle, for example. (There is a "Gallery" of cartoonish symbols that are useless for scientific graphs.) As near as I can tell, you can't make the connect-the-dot line a different color from the symbol background (such as a black line connecting open symbols), either (if you know of a way to do this, please let me know, as this is really stupid).
- If you want a regression line, select a symbol from the data series, then choose "Trend Lines" from the "Insert" menu.
- Repeat the above for each set of symbols.
- To add error bars, select a symbol from the data series, then choose "Y error bars" from the "Insert" menu. Under "Error Category", choose "Cell Range." In the box next to "Positive(+)", enter the range of cells containing the confidence intervals. Click the "Same value for both" box if the confidence intervals are symmetrical; if the lower confidence interval is different from the upper, enter its range of cells in the "Negative(-)" box.

You can change the color and width of the error bars, but unfortunately, the only style you can use is bars with a "T" at each end. - Choose "Chart Area" from the "Format" menu. On the "Lines" tab, you'll probably want to make the border be "Invisible."
- You should now have a fairly good-looking graph. You can click once on the graph area, copy it, and paste it into a word processing document, graphics program or presentation.
The number of bird species observed in the Christmas Bird Count vs. latitude at seven locations in Delaware. Data points are the mean number of species for the counts in 2001 through 2006, with 95 percent confidence intervals.

#### Back-transformed axis labels

If you have transformed your data, don't plot the untransformed data; instead, plot the transformed data. For example, if your Y-variable ranges from 1 to 1000 and you've log-transformed it, you would plot the logs on the Y-axis, which would range from 0 to 3 (if you're using base-10 logs). If you square-root transformed those data, you'd plot the square roots, which would range from 1 to about 32. However, you should put the back-transformed numbers (1 to 1000, in this case) on the axes, to keep your readers from having to do squaring or exponentiation in their heads.

I've put together three spreadsheets with graphs that you can use as templates: a spreadsheet graph with log-transformed or square-root transformed X values, a spreadsheet graph with log-transformed or square-root transformed Y values, or a spreadsheet graph with log-transformed or square-root transformed X and Y values. While they're set up for log-transformed or square-root transformed data, it should be pretty obvious how to modify them for any other transformation. Although these graphs put the tick marks in the right places, I couldn't figure out how to label the tick marks automatically in Calc; you'll have to copy the graph to a graphics program and add the tick mark labels there.

Abundance of the longnose dace, in number of fish per 75 linear meters of stream, versus nitrate concentration. Fish abundance was square-root transformed for the linear regression. |

### Drawing bar graphs using Calc

- Put the values of the independent variable (the nominal variable) in one column, with the dependent variable in the column to its right. The first column will be used to label the bars or clusters of bars. You can have more than one dependent variable, each in its own column; each will be plotted with a different pattern of bar.
A Calc spreadsheet set up for a bar graph including confidence intervals. - Select the cells that have the data in them, including the first column, with the values of the nominal variable.
- From the "Insert" menu, choose "Chart" (or click on the little picture of a graph in the task bar). On the "Choose a chart type" screen, choose "Column" and "Normal," the one with columns next to each other.
- Click the "Next" button a couple of times. On the "Chart Elements" screen, enter titles for the X axis and Y axis, including the units. A chart title is essential for a graph used in a presentation, but optional in a graph used for a publication (since it will have a detailed caption). Get rid of the legend if you only have one set of Y values. If you have more than one set of Y values, get rid of the legend if you're going to explain the different bar patterns in the figure caption; leave the legend on if you think that's the most effective way to explain the patterns.
- Click the "Finish" button, but you're not done yet. Click on the white area outside the graph to select the whole image, then drag the sides or corners to make the graph the size you want.
- Choose "Chart Wall" from the "Format" menu, and then choose "White" on the "Area" tab. This will get rid of the ugly gray background. Under "Lines," make style "Continuous" and the color "Black," to give you a border around the graph.
- Choose "Axis" from the "Format" menu, then "Y axis", and make modifications to the tick marks, font and number format. Most publications recommend sans-serif fonts (such as Arial, Geneva, or Helvetica) for figures. On the "Scale" tab, set the minimum and maximum values of Y. The maximum should be a nice round number, somewhat larger than the highest point on the graph. If you're plotting a binomial percentage, don't make the Y-scale greater than 100 percent. If you're adding error bars, the maximum Y should be high enough to include them. The minimum value on the Y scale should usually be zero, unless your observed values vary over a fairly narrow range. A good rule of thumb (that I just made up, so don't take it too seriously) is that if your maximum observed Y is more than twice as large as your minimum observed Y, your Y scale should go down to zero. If you're plotting multiple graphs of similar data, they should all have the same scales for easier comparison.
- Format your X-axis the same way you formatted your Y-axis.
- Choose "Title" from the "Format" menu, then "Y axis title", and adjust the font. Do the same for the X-axis title.
- Pick one of the bars, click on it, and choose "Object properties" from the "Format" menu. On the "Borders" tab, choose the kind of border you want for the bars, then choose the pattern inside the bar on the "Area" tab. On the "Options" tab, adjust the width of the bars.
- Repeat the above for each set of bars.
- To add error bars, select a symbol from the data series, then choose "Y error bars" from the "Insert" menu. Under "Error Category", choose "Cell Range." In the box next to "Positive(+)", enter the range of cells containing the confidence intervals. Click the "Same value for both" box if the confidence intervals are symmetrical; if the lower confidence interval is different from the upper, enter its range of cells in the "Negative(-)" box.

You can change the color and width of the error bars, but unfortunately, the only style you can use is bars with a "T" at each end. - Choose "Chart Area" from the "Format" menu. On the "Lines" tab, you'll probably want to make the border be "Invisible."
- You should now have a fairly good looking graph. You can click once on the graph area (in the blank area outside the actual graph), copy it, and paste it into a word processing document, graphics program or presentation.
The number of bird species observed in the Christmas Bird Count at seven locations in Delaware. Data points are the mean number of species for the counts in 2001 through 2006, with 95 percent confidence intervals.

#### Back-transformed axis labels in bar graphs

If you have transformed your data, don't plot the untransformed data; instead, plot the transformed data. For example, if you've done an anova of log-transformed data, the bars should represent the means of the log-transformed values. Calc has an option to make the X-axis of a bar graph on a log scale, but it's pretty useless, as it only labels the tick marks at 1, 10, 100, 1000…. The only way I know of to get the labels at the right place is to format the axis to not have labels or tick marks, then use the drawing and text tools to put tick marks and labels at the right positions. Get the graph formatted and sized the way you want it, then put in dummy values for the first bar to help you position the tick marks. For example, if you've log-transformed the data and want to have 10, 20, 50, 100, 200, on the Y-axis, give the first bar a value of LOG(10), then use the drawing tools to draw a tick mark even with the top of the bar, then use the text tool to label it "10". Change the dummy value to LOG(20), draw another tick mark, and so on.

### Exporting Calc graphs to other formats

Once you've produced a graph, you'll probably want to export it to another program. You may want to put the graph in a presentation (Powerpoint, Keynote, Impress, etc.) or a word processing document. You should be able to click in the graph to select the whole thing, copy it, then paste it into your presentation or word processing document. Sometimes, this will be good enough quality for your purposes.

You'll often want to put the graph in a graphics program, so you can refine the graphics in ways that aren't possible in Calc, or so you can export the graph as a separate graphics file. This is particularly important for publications, where you need each figure to be a separate graphics file in the format and high resolution demanded by the publisher. This is actually easier to do with Calc than with Excel.

To make publication-quality graphs with Calc, get your graph the way you like it, then copy it and paste it into Draw. In Draw, choose "Break" from the "Modify" menu. You may have to choose "Break" a second time; keep choosing it until the word is grayed out in the menu. At that point, you've broken the graph into its individual elements, and you can modify them individually. You can change the font or color of text, change line thicknesses, change the pattern or color of filled-in rectangles, etc.

Once you have the graph the way you want it, save it. Then export a copy to the file format you want: probably .eps or .tif for publications, .gif for web pages.

### ⇐ Previous topic | Next topic ⇒

This page was last revised September 9, 2009. Its address is http://udel.edu/~mcdonald/statgraphcalc.html. It may be cited as pp. 287-296 in: McDonald, J.H. 2009. Handbook of Biological Statistics (2nd ed.). Sparky House Publishing, Baltimore, Maryland.

©2009 by John H. McDonald. You can probably do what you want with this content; see the permissions page for details.