07 January 2012

Color Me Happy

I posted the following comment on Offensive Politics' post on its graphic presentation of Iowa results:

This post got referenced on Revolutions Analytics, and garnered a comment that the (default) HCL scale used by ggplot is faulty. I commented that the closeness of the vote is perfectly reflected in the color scale; the confusion is correct.

So, off I went to Wickham's ggplot2 book, chapter 6 on scales. scale_full_gradient2() and library(vcd) provide ways to impose non-linear color scales. At first, a good thing. But I wonder. There's been a good deal of discussion, here and there, with regard to spinning data. It would seem that using a non-linear color scale is just as bad as fiddling with axis scales.

What do you think?

I mentioned this post earlier, which was then followed by its linking from Revolution Analytics. One of the prime directives of professional statistics, as distinct from polemic number spewing, is that form shouldn't affect perception. In other words, don't fiddle the picture to plant an idea not supported by the data. In this case, the original commenter complained that the color scale made it difficult to distinguish among the counties' vote winners. My view, so far at least, is that the defaults used by R/ggplot for this implementation rightly reflect the similarity between Romney and Santorum.

The default color scale is, essentially, linear. The alternatives mentioned are decidedly non-linear. Now, the notion of linearity with respect to color scales may be problematic, but the ggplot defaults do implement a notion of linearity, in that two opposite colors are blended along a linear mixing.

Here's examples from Wickham's ggplot2 text. I suppose it was uploaded legally. I hope.

This is the various color scalings:

As can be seen, the color gradient in the right two alternatives is far more extreme. I'm still not convinced that one should do such fiddling.

No comments: