Wednesday, August 28, 2013

Class is in...

When I started this blog several years ago, the original intent was to provide notes for a sales team and clients.  It was a basic cartographic education for people who have no background in displaying data on maps, or who would simply hit defaults in the software they are using with little thought to what they are describing.  The goal was to provide enough information so they may make better decisions not just in the map making process, but in interpreting maps as well.

One of the most common maps you may see, especially in the media, is a choropleth or shaded area map.  The goal is to compress vast amounts of data into digestible portions for the map readers. Whereas a table of numbers provides details for individual locations, the map should generalize these locations; collecting locations together into chunks of similar values making the map easier to interpret. These chunks are called classes.  The process of chunking is referred to as classification.

There are other factors to take into consideration when developing a choropleth map, such as the number of classes, colors and color ramp.  Altering any of those can have a dramatic effect on the story you are telling with the map.  I'll discuss those in a future article, for now we will briefly look at three methods of classification: Equal Interval, Quantiles and Natural Breaks.

Take for example the map below.  It is serviceable to the extent values are displayed spatially, but you would be hard pressed to make any sense of the distribution.  This would be far worse with hundreds or thousands of values.  Below are three common methods used for classifying data.

Classless Map
This map has no class.
We can look at this data another way by placing them on a number line.  Here we see the frequency of the values.  Five of the values are duplicated and there are some fairly large gaps present.  These facts are important later.

Equal Interval classes consist of an equal range of values along the number line.  In this method, the range of the values is the basis of the class breaks.  Let’s classify the values in the map above using Equal Interval, 4 classes.  First we need to know the range.  The range from 1 to 100 is 99 (100 minus 1).  Divide the range by the number of classes you wish to use, in this case 4, to get a class range of 24.75.  So at every interval of 24.75, we place those values in a separate class and map them below.  Because we have no fractions in the values, the software rounded the values for the map below.

Class 1 (1 to 25.75): 1 2 3 4 5 8 10 11 11 12 13 14 15 15 16 20 23 24
Class 2 (25.76 to 51.5): 26 31 31 35 47 49 49 50
Class 3 (51.6 to 76.25): 
Class 4 (76.25 to 101): 80 81 83 85 86 90 91 92 92 93 95 96 98 100

Equal Interval Map
An Equal Interval Map.
With this method, we have an empty class indicated by the range in class 3.  In the legend, there is a color, however no corresponding color on the map.  This method is relatively quick to calculate and had been used as a default in many GIS applications for that reason.

Warning: Equal Interval is not suitable for this sort of data.  Due to the nature of the ranges, it is best to use this classification method when dealing with continuous data-- rainfall, temperature and elevation for example.  

Quantile places an equal number of data values in each class.  This is the easiest method to create and interpret.  You would simply divide the number of data values (40) by the number of classes you wish to use.  

40 data values / 4 classes = 10 data values in each class

Class 1: 1 2 3 4 5 8 10 11 11 12
Class 2: 13 14 15 15 16 20 23 24 26 31
Class 3: 31 35 47 49 49 50 80 81 83 85
Class 4: 86 90 91 92 92 93 95 96 98 100
Quantile Map
A Quantile Map
Quantiles are also very quick to create and are very easy to interpret.  In the map above you can quickly see the top and lower 10 areas.   However, there is one problem that can occur with Quantiles-- overlapping classes.  

Look at Class 2 and 3. The value 31 shows up in each one.  As the cartographer, you would either need to change the number of classes, or make a decision as to which class 31 should belong in.  In the case of the map above, the software chose the class for those values and adjusted the breaks accordingly, meaning 2 classes do not have exactly 10 values.

Natural Break classes conform to the natural gaps in the data.  Look at the number line image again and you will see there is a fairly large gap between 35 and 47, 62 and 70, 80 and 81. Traditionally, this method has been manually selected by the cartographer; however GIS applications have automated the process.  Let’s look at our data again:

Class 1: 1 2 3 4 5 8 10 11 11 12 13 14 15 15 16
Class 2: 20 23 24 26 31 31 35
Class 3: 47 49 49 50
Class 4: 80 81 83 8586 90 91 92 92 93 95 96 98 100
Natural Breaks Map
A Natural Breaks Map
Using Natural Breaks, there was consideration of the gaps in the data in order to select the breaks.  The largest benefit of this method is you can see any outliers-- values that may fall well outside of the norm for the data.  You also eliminate the nasty problems of empty and overlapping classes caused by the Equal Interval and Quantile methods.  You might have decided to break the class differently, and GIS software would do so in a different manner using a more complex algorithm to determine the breaks.

This is all fine, but which should you use?  If you are simply exploring the data, it is always useful to look at each one and see how the data is spread across the map.  Also look at a frequency table of the data, are there any large gaps?  Is the data flat, or is there a curve?  Anyone method may provide some insight you might have missed with another. However, if you are creating static displays for another reader, these brief guides may be helpful:

Natural Breaks:  You have discrete data such as population counts (people, pigs, cars, etc…), normalized values (People per square mile, median/per capita income, etc…).

Equal Interval:  You have continuous data such as elevation, rainfall, temperature or other data that is continuous across a surface.

Quantiles: Quick view of the geographic spread, or time series of discretely recorded data.  This method works well for time series/animation since there are an equal number of geographic units in each class.  The reader can easily see change over time by comparing differences between larger areas.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.