Click the button below to print this chapter in the usual way.

The best print results are obtained if the text is reduced in size and printed on sheets of paper that are smaller than A4. This can be done using your browser's Page Setup command to scale by 71% and then printing on A5 paper.

If you don't want to print now,

Chapter 9   Data Presentation Principles

9.1   Types of publication

Purpose

Information is published for several purposes.

Public reports
Most organisations publish reports about specific aspects of their work that are aimed at a wide audience. These are often regular publications such as annual summaries about aspects of the state if the organisation, but specialist reports and press releases may be written about specific projects or topics. Brevity is important, so the information in reports must be presented concisely and clearly.
Executive reports
These are usually internal reports about specific topics. It can often be assumed that the readership is more technically sophisticated, so more advanced analyses and presentation can be used.
Archival and regular publications
There is often a need to provide access to detailed data to allow further analysis. Large tomes of tabulated data were once published by government organisations, but such data are often provided electronically now. In spreadsheet format, users can easily input the data into other programs
Some organisations have a statutary obligation to provide certain information at regular intervals (often annually). This information is usually archival, but sometimes a 'Yearbook' is published for a wider audience and its format is therefore closer to a report.

Care must be taken to present information clearly in all types of publication, but it is most important for publications aimed at the general public and least important for archival data.

Publication medium

A few decades ago, all information was published in black and white on paper. Colour is now used more often in paper publications, but an increasing proportion of publications are produced only electronically for display on a computer. Such computer publications are often provided over the internet.

The use of colour in both paper-based and electronic publications makes it much easier to present information clearly, especially in graphs.

Quality and resolution of medium
Colour print on glossy paper is best but expensive. Computer screens also show colour but their resolution is relatively low -- typically a tenth of the number of 'dots' per unit area that would be used on paper. The poorer resolution of a computer limits the type size of text and the detailed information that can be presented, especially if the screen is small. Newsprint and many other low-quality colour printing processes have reasonable resolution but allow only limited if any use of colour.
Cost
It is much cheaper to make large quantities of information available on the internet than on paper. However the cost of producing publications on paper tends to ensure that they are designed better -- there is a conscious effort to cut back the presentation to the essentials that convey the intended message. It is much harder to create a user interface that allows users to easily find information that is published on a web site. It is critically important that web sites are easy to navigate and the structure of the web site must be included in the cost of publishing the information.
Format
Many reports are published on the internet as 'pdf' files. These are the electronic equivalent of a series of printed pages and can be laid out with text and graphics in the same way as is done on paper. Although pdf files are still limited by the screen resolution of the computer, it is possible to zoom in and to print them at much higher resolution.
An alternative is to publish information directly as web pages that can be viewed from a web browser. Although the ability to format such web pages was limited until recently, web pages can now be laid out almost as flexibly as on paper (using functionality called cascading style sheets). It is possible to make a web site interactive, both for navigation around the site and for interacting with individual pages. (CAST itself is an example of what is possible.)
Archival data can be directly provided on the internet in spreadsheet format. Alternatively, large organisations often have a web-based interface that can extract information from a large behind-the-scenes database. For example, the United Nations provides a wealth of information that can be accessed at http://unstats.un.org/unsd/cdb.

Some types of publication must be printed on paper in order to reach a wide audience. In contrast, archival information is best provided on a web site because there are likely to be few users and they will find the information most useful electronically. However with computers and internet connections becoming more common, a well-designed web site is becoming a very powerful and flexible way to present information.

There is no perfect publication medium, but a well-designed web site should be considered.

9.2   From data to information

Selecting information to display

Most organisations have access to large amounts of information about any topic.

Any report must be very selective about the information that is displayed. It should concisely summarise the important features of the data. The decision about what is the important information in data is subjective but is critically important.

The report should summarise the most important features of the data.

Signal and noise

The distinction between the 'important' information in data and the rest is similar to the distinction between signal and noise in electronics and telecommunications. Engineers distinguish between the signal that is being communicated between two locations and the noise that is added by the communications channel. The noise degrades the signal and, in the worst cases, can make the signal difficult to detect.

Signal   =   information you want
Noise   =   'random' modification to the signal

As in electronics, an important goal of data presentation is to extract the 'signal' from a data set and clearly display it without the 'noise' of the remaining detailed information in the data.

Example of signal and noise

The word 'CAST' can be seen below. This is the signal in the picture.

Drag the slider to the right to add random noise to the picture. Even when the slider is in the middle, it is becoming hard to distinguish the signal in the picture.

Adding noise makes it harder to detect the signal in the picture.

Prune out unnecessary detail

Many reports are filled with bar charts, pie charts, histograms, scatterplots and a variety of other plots, just because the researcher produced them when exploring the data! It is sometimes useful to ask yourself 'What single display of the data conveys the information most relevant to the message I wish to convey?' If there is more than one thing to convey, then more charts may be needed, but at least this priority approach prevents you from spending too much time on less important details, and at the same time encourages you to decide what really is important. Graphics, tables and text should only be included if they add new and interesting information about the data.

Before producing a report or other publication, it is important to first identify the most important information that you want to convey.

9.3   Tables, graphs and text

Reports present information in three main ways -- tables, graphs and text.

Textual descriptions

Paragraphs of text are rarely adequate descriptions of the information in data on their own. Graphical and tabular displays invariably convey information much more clearly and in a much more immediate and memorable way. However graphics and tables must be integrated into a report, and there is definitely a place for text to describe the source and background of the data and to summarise the notable features of the display.

Text should be used to summarise and interpret information in tables and graphs, but not to simply repeat in words information that has already been clearly presented in another form. Such repetition tends to obscure rather than inform.

Tables

Tabular displays are often effective summaries of very simple data sets. For example, the following table describes the New Zealand defence force personnel in 2005 as concisely as any graphical display.

   Count  Percentage
Navy 1,910 22%
Army 4,438 52%
Air force 2,266 26%
Total 8,614 100%

For a very non-technical audience however, a pie chart may be preferred.

Larger tables can concisely present a lot of detailed information to the reader. However the danger is that the ‘broad picture’ is often obscured by the detail -- it is hard to see the wood for the trees. There is often a way to display the data graphically that makes the 'signal' in the data more prominent.

Large tables should usually be summarised briefly in the body of a report with the full table relegated to an appendix or made available for download from a web site. In particular, the availability of large tables of raw data on a web site may make it easier for technically able readers to do further analysis of the data.

Graphical displays

Graphical displays such as bar charts, pie charts, histograms, maps and scatter plots are particularly effective ways to convey information since the human eye can readily detect, interpret and retain patterns. There are many further ways to graphically display information.

What is it that makes a statistical display of data excellent? There is no better discussion of this than in Tufte’s book, “The Visual Display of Quantitative Information”. You are encouraged to examine the many examples of the art which are shown in that text. To quote Tufte,

Excellence in statistical graphics consists of complex ideas communicated with clarity, precision and efficiency.

Even simple ideas need to be presented with that same clarity, precision and efficiency. Any statistical graphic should show the data efficiently and truthfully, should not distort the information in the data and should be closely integrated with numerical and verbal descriptions of the data.

Annotation

A good graphical or tabular display can often ‘speak for itself’ -- its message is immediately clear without further explanation. However it often helps to write comments on a diagram (a) to point out important features and (b) to add extra information such as labels that give extra insight. Two examples of annotated graphics are shown below.

Atomic weight and volume

The following diagram was printed in Tufte's book. The added text and grey lines on the scatterplot help to highlight the periodicity of properties of chemical elements. A brief textual commentary in the main text would also help.

Life expectancy and income

The scatterplot below shows the life expectancy and gross national income (GNI) per capita in all countries in 2004. The annotation highlights seven countries whose life expectancy is much less than would be expected from their GNI and gives an explanation. It also labels the country with the highest GNI per capita and the caption summarises the relationship in words.


9.4   Combining simple graphs

Multiple simple graphs

Many publications only include simple graphical displays of data such as bar charts and time series. It is easy to confuse the reader by incorporating too much information in a graphic, but simple graphics can sometimes be combined in rich ways that encourage the reader to investigate the relationships between different measurements.

To be effective, the different elements of the display should usually be linked to either a time axis (e.g. a collection of related time series drawn against a common time axis) or a map (e.g. pie charts drawn next to each region).

Consider whether different information can be linked in a single display.

When planning any such diagram, critically consider whether it may be too complex to be easily understood by the intended audience.

New York weather in 1980

The following diagram was published in the New York Times to describe the weather in 1980.

This diagram was commended by Edward Tufte as a graphic that is extremely rich in data but is still easily understandable. It shows:

The format encourages the reader to look for relationships between temperature, rainfall and humidity. For example, the daily temperature range (maximum – minimum) was typically lower in the winter than in the summer (shaded vertical distance between the minimum and maximum time series). Also, the period from late August to early September was much warmer than normal. (Both maximum and minimum daily temperatures tended to be higher than normal during this period.)

Also note the use of annotations to indicate the warmest and coldest days in the year.

New Zealand wool exports in 1879

The next example is a map that appeared in the Bateman New Zealand Historical Atlas to present information about wool production and exports in New Zealand in 1879. It effectively shows:

The eye is encouraged to investigate the relationship between the location of sheep farms, the climate and ports.

(The use of 3-dimensional piles of wool bales to represent exports might be criticised but the scale -- 500,000 lb of wool per bale -- means that it is unlikely to mislead here.)

9.5   Advanced graphs

Other types of diagram

We have only described a few general-purpose graphical methods for describing data. These graphs can be applied to data from a wide range of applications and are fairly easy to understand without training. In general, publications that are intended for a wide readership should avoid more complex graphics.

However we briefly note the existence of other ways to present data graphically that are particularly useful in some situations.

The following two examples are only included to show that we have not covered all types of graphical display.

Always use the simplest graphical method that will convey the information that you want.

Monthly rainfall in Samaru

In most of Africa, the most important climatic variable is rainfall. Rainfall is usually highly seasonal and failure of crops is normally associated with late arrival of rain or low rainfall. A better understanding of the distribution of rainfall can affect the crops that are grown and when they are planted.

The diagram below is based on monthly rainfall in Samaru, Northern Nigeria between 1928 and 1983. For each month, the diagram and table show:

The bands in the diagram join up these values for different months.

Click on any month in the table to link the graph to the tabulated values.

This diagram is a useful way to describe how rainfall varies throughout the year and to help assess the likelihood of 'ten-year droughts and floods'. It does however require more explanation than would be acceptable in most publications for public consumption.

Road accidents in Israel

The diagram below was published recently (by D. G. Feitelson) to introduce a new type of diagram called a spie chart. It is based on a standard pie chart of the age and sex distribution in Israel -- the angle (and area) of each segment of the basic circle is in proportion to the number of that age and sex.

The darker segments describe the ages of all road accident casualties in Israel in 2002. These segments use the same slices as for the overall population distribution with their radius adjusted to make the areas of the slices in proportion to the number of casualties in that age group and gender.

If casualties followed the same distribution as the rest of the population, the darker segments would form a complete circle; where they bulge out, there are a disproportionate number of casualties. The diagram therefore effectively illustrates the disproportionate number of male casualties and the particular over-representation of males aged 15-44.

Take care with interpretation of the graph. The bulge in male casualties aged 15-54 could be caused by more reckless drivers of this age, but they probably also spend more time driving.

Although the diagram would be informative to policy makers in transport and health, it would require too much explanation for inclusion in publications intended to be read by the general public.

9.6   Innovative graphics

Other types of graphic

In this chapter, we have concentrated on a few commonly used types of graphical display of data. It is rare for a non-technical publication to go beyond these general-purpose graphs.

However an innovative graphic can sometime work well for a particular type of data. Two examples are given below.

If your data are of a non-standard type, you may be able to devise a novel way to clearly display them.

Napoleon's invasion of Russia

The diagram below was published by Charles Joseph Minard in 1861. Edward Tufte says that "it may well be the best statistical graphic ever drawn." (The diagram has been redrawn slightly with some place names omitted for a computer screen.)

The diagram portrays Napoleon’s disastrous campaign of 1812-1813 when his army invaded Russia. After laying siege to Moscow, Napoleon was forced to retreat by the harshness of the Russian winter. The width of the band describes the number of surviving soldiers during the invasion and retreat. It effectively illustrates where the soldiers died during the campaign. The temperature graph at the bottom shows the temperatures during the retreat.

Maori migration in New Zealand

The indigenous population in New Zealand is Maori and many of them have migrated from rural areas to the cities (mainly Auckland) in the last half century. The diagram below is based on a distorted map of New Zealand in which the areas of the regions are proportional to their Maori populations.

The widths of the arrows represent the numbers migrating between regions, as determined in a survey that was conducted in the 1960s. It clearly shows movements of the Maori population.