15 November 2011

The Red and the Blue

I was going to build a US map (using R facilities) showing net federal funds at the state level, but found there are a colossal number of these already. No need to demonstrate that yet again. The point would be to demonstrate doing so within the RDBMS, following in the Triage piece's footsteps. I'll just show the code to generate a US map, as shown by Wickham's book.

CREATE OR REPLACE FUNCTION "public"."test_graph" () RETURNS text AS
library(maps, lib.loc="/home/postgres/R/x86_64-unknown-linux-gnu-library/2.14/")
library(plyr, lib.loc="/home/postgres/R/x86_64-unknown-linux-gnu-library/2.14/")
library(proto, lib.loc="/home/postgres/R/x86_64-unknown-linux-gnu-library/2.14/")
library(reshape, lib.loc="/home/postgres/R/x86_64-unknown-linux-gnu-library/2.14/")
library(grid, lib.loc="/home/postgres/R/x86_64-unknown-linux-gnu-library/2.13/")
library(ggplot2, lib.loc="/home/postgres/R/x86_64-unknown-linux-gnu-library/2.14/")
states <- map_data("state") arrests <- USArrests 
names(arrests) <- tolower(names(arrests)) 
arrests$region <- tolower(rownames(USArrests)) 
choro <- merge(states, arrests, by = "region")
choro <- choro[order(choro$order), ] 
print(qplot(long, lat, data = choro, group = group, fill = assault, geom = "polygon", asp = .6) + borders("state", size = .5)) 

Rather more text than the Triage demonstration. R is built on a multi-user model, but is normally used as a standalone application on a PC. And then, there's the *nix issue. The upshot is that nothing need be done to use "base" modules, and those include the scatterplot matrix in the Triage piece. As mentioned in the piece, R supports (at least) two other graphics engines: lattice and ggplot2. Lattice is an extension of base graphics, while ggplot2 is an implementation of a grammar based graphics engine. This Grammar of Graphics is documented (but not a code base) in Wilkinson's book, at 712 pages and no code, we'll see (just Amazoned it)!

This map was created with ggplot2 functions, although no database data is used. It is necessary to call out each package/library explicitly, as well; PL/R doesn't know to load dependent packages, alas. In the context of the Triage piece, the application would show the net position of the party's candidates by state, along a Blue/Red vector. Just so happens that the R installation includes some state level data, which Wickham uses illustratively. One might extrapolate that Red States are more violent than Blue States, on the whole. Not that I'm making such an extrapolation, of course.

Loading of non-base libraries can be done in one of two ways: if the R engine library directory has global write permission (not normally so under *nix) then any package (which is then called a library in use, yeah, I know) loaded by any user goes to the directory and can be referred to directly by PL/R; on the other hand in usual installs, each user has packages installed to a local directory. Since postgres (the engine) runs as postgres the user, the packages need to be installed by postgres (the user) from an R session. In a corporate (and political campaigns are very much so) environment, standards and conventions would need to be established.

Ideally, what I'd want, following on the thesis of the Triage piece, is a clickable map (states), but that gets into non-rectangular html buttons (Google maps, I'd wager); not a topic I'm conversant with, yet. Whether it would make sense to generate the map in R with the cruft needed to implement the button logic is another puzzle. I think not, but not sure; R doesn't impress me as a strong string manipulation language. Ideally, then, the map would not only be generated by R, but each state would be a button, which would call a second R function in postgres to show the county/municipal/zip map. Could be a bit of work, but your candidates are worth it.

Here's the picture (this is a png, since Blogger won't chew pdf):

No comments: