Table of contents

Just because you have data doesn’t mean that you should make a visualization. So when does the data warrant a visualization? The first question to ask yourself is:

Does the data tell an interesting story?

If you weren’t able to draw any kind of visualization, is there anything in the data worth telling someone about? In other words, don’t compensate for a boring story with a overwrought attempt at visualization.

The next thing to consider: Will a visualization tell the story in a way that words can’t? Visualizations shouldn’t be seen as ways to break up the text in a long report. They should at the very least, clarify the data.

John Snow’s Cholera Map

John Snow’s cholera map is rightfully one of the most celebrated examples of data visualizations, not least of which because it’s associated with the birth of modern epidemiology.

John Snow's Cholera Map

Instead of plotting a time-series, which would simply report each day's bad news, Snow constructed a graphical display that provided direct and powerful testimony about a possible cause-effect relationship

As Edward Tufte demonstrates, there were plenty of ways to badly plot this data:

img

The use of the Voronoi diagram

However, author Steven Johnson argues that the dot map was not itself a breakthrough, as dot maps had been used before to map disease. More importantly, the map did not counter the competing theories for cholera’s spread:

After presenting to the Epidemiological Society, Snow had realized that his original map was still vulnerable to a miasmatic interpretation. Perhaps the concentration of deaths around the Broad Street pump was merely evidence that the pump was releasing noxious fumes into the air. And so Snow realized he needed a way to represent graphically the foot-traffic activity around the pump that he had so painstakingly reconstructed. He needed to show lives, not just deaths; he needed to show the way the neighborhood was actually traversed by its residents.

The second version of Snow’s map below, via the John Snow archive, denotes the boundary between areas closer to the Broad Street pump versus to the surrounding pumps. This kind of visualization is referred to as a Voronoi diagram, though it was formally named afterwards for the mathematician Georgy Fedosievych Voronyi.

img

When maps shouldn’t be maps

Standard elections map

img

Because of their general role in helping us locate where we are in relation to, well, anything, maps have a universal appeal and familiarity. However, they are not always the best ways to show data. Even John Snow’s elegant dot map, without additional data and research, could mislead readers into supporting a “miasamatic” theory.

From Matt Ericson’s classic essay on modern news maps:

img

But while maps like [the above] are interesting to look at, it also forces readers who want to figure out the correlation between income and flooding to try and visually sum up all the colors on the map in their head. The map shows there’s low-income areas in the flooded areas and there’s also low-income areas outside the flooded areas. There’s middle- and upper-income areas in each, too. Unless the pattern is super clearcut, trying to figure out how much of a relationship exists is a tricky task.

Alternatives to mapping

The Big Board view

Because the representation of each state is the same size, this chart doesn’t make the point spread obvious. However, it does show through horizontal position how Obama won where it counted (in the tossup states)

img

Elections Scenarios Chart

A great example of how an infographic can bring clarity to data that words can’t. Without any interaction, this NYT diagram of election outcomes immediately informs the viewer of which states are at stake and makes a clear implication for how lopsided the election odds are, when considering the U.S. electoral system. When you actually interact with the graphic to test out scenarios, a new dimension is shown: how narrow Romney’s chances are if he loses any of the battleground states.

img

In class examples

Class break. Go over these examples:

Gay rights in the US, state by state, by The Guardian

This is a compact way to show roughly 350 data points in one screen. The radial design is not as easy to interpret as a typical bar chart, but it does allow for the clever positioning of each region (the southeast is in the lower-right of the circle). It’s hard to imagine how the many “levels” of rights could be represented in an ordinary U.S. map of the states.

img

Hate Map, via Humboldt University

Technical problems mar this map (try zooming out to the nationwide view), which is already suspect because of its very selective sample group.

img

Workshop: Fusion Tables and NYC School Data

This is a review of how to use Google Spreadsheets and Fusion Tables to organize and visualize data. The process is virtually the same as the one outlined in my indepth walkthrough

Make two Fusion Tables

  1. Import the three Fusion Tables (School Locations, SAT 2010, SAT 2012, and School Locations)
  2. Merge 2012 to 2010 SAT tables on the DBN field
  3. Download as CSV

Make new data in Spreadsheets

  1. Upload as CSV to Spreadsheets
  2. Sort and delete the s and blank values
  3. Make four new columns
  4. 2010 - 2012 Test Takers: =H2-C2
  5. Conditional Formatting
  6. Make a code column out of the DBN: =RIGHT(A2,4)
  7. Download as CSV

Back into Fusion tables for mapping

  1. Upload the merged 2010-2012 SAT CSV into Fusion Tables
  2. Merge the SAT table with the School Locations table
  3. Create a map that uses latitude and longitude
  4. Play with the Change map styles option. How does the perception of the data change as you base the markers off of different columns (e.g. reading scores vs math scores)?
  5. How does the perception change as you alter the colors and the size of the buckets?

The final dataset (which has a few aberrations) can be found here (the 2010-2012 columns in this set show percentages, rather than absolute differences).

Impact of map colors and markers

The NYC SAT scores dataset is an example of what should not be a map. It also serves as an easy example of poor selection of markers and thresholds can even further distort the data.

How you divide of the buckets (the thresholds for each level) and the range of colors can greatly impact the map.

First, let’s look at the SAT Reading scores and map by how much they changed from 2010 to 2012. According to the Fusion Tables summary, the range of change goes from -17.25% to 16.47%

If we divide this into 8 different buckets:

img

We get a paintball mess of colors:

img

The other extreme is to divide the data into two buckets: red for anything 0 or negative, green for all positive change:

img

The lack of different buckets obscures the difference between a 1% improvement and a 15% improvement.

Here’s the data in four different buckets; there’s more variety, but no more real insight without additional geographical data (such as Census socioeconomic data):

img

And is mapping the percentage of change even the right approach? A school that averaged 100 points in 2010 and improved to 150 points in 2012 would boast an improvement of 50%. Yet a score of 150, in absolute terms, is nothing to brag about.

Here’s the absolute reading scores for 2012, divided into these four buckets:

img

Under these thresholds, the SAT reading performance looks less impressive:

img

The point: despite how computers have made it easy to make maps, there are many ways that maps can obscure or distort the data.

Related reading

Keep it simple

Like any other form of journalism, visualizations are not better simply because they are bigger or more complex.

Small multiples

Edward Tufte popularized the concept of a “small multiples” chart as a way to make clean visual comparisons across multivariate data:

At the heart of quantitative reasoning is a single question: Compared to what? Small multiple designs, multivariate and data bountiful, answer directly by visually enforcing comparisons of changes, of the differences among objects, of the scope of alternatives. For a wide range of problems in data presentation, small multiples are the best design solution.

One of my favorite examples that Tufte cites is this crude but criminally effective chart of crimes:

img

A WEAKNESS IN GOTTI CASE; Major U.S. Witnesses Viewed as Unreliable

Many lawyers and prosecutors who followed events in the seven-month trial of John Gotti said the underlying weakness of the prosecution’s case was its apparent reliance on turncoat career criminals as key witnesses against Mr. Gotti and six co-defendants. A signal that the credibility of the prosecution’s principal witnesses was in doubt came yesterday morning when the jury, in its final request before acquitting the defendants of all charges, reviewed an exhibit introduced by the defense. It was a chart listing the lengthy criminal records of seven prosecution witnesses who had obtained promises of leniency and other favors from the Government in return for their testimony against Mr. Gotti. Admitted Lies in the Past

Fancy tables

Don’t discount the humble table as a visualization. This NYT chart of Tour de France cheaters, in its static form, can be thought of as a spreadsheet with images in each cell:

img

Top Finishers of the Tour de France Tainted by Doping

Driving is Why You’re Fat - GOOD

Slopegraph

Again, via Edward Tufte – the slopegraph on the right has fewer data points than table on the left, because it encodes the change between time periods with connecting lines, which more clearly convey the rate of change between data points:

img

Readings

So far, we’ve been making a lot of maps from data, even though maps aren’t a great way to visualize the data we’ve been looking at. The interactive maps generated by Fusion Tables represent one end of the complexity spectrum – after properly organizing our data, we can generate interactive maps with relative ease.

If we can make maps from our spreadsheets, then you can assume that it’s possible to make a great variety of less-complicated visualizations. In the next class, we’ll go over how to make compelling (non-map) visualizations with a limited toolset.

Read the following articles for next week:

More readings:

Also, check out the other articles about visualization that I’ve listed on this site’s Readings page.

Back to top