Search code examples
rggplot2geospatialcensus

Drawing maps based on census data using ggplot2


I have a list of points that I want to overlay on a map of San Francisco using ggplot2. Each point is a longitude, latitude pair. I want the resulting map to be in a longitude/latitude coordinate system. I managed to reproduce Hadley Wickham's directions for plotting polygon shapefiles using his example file. I am using R 2.15.1 for Windows.

However, I tried to use cdp files downloaded from the UScensus2010cdp package. Here's my code snippet:

require("rgdal") 
require("maptools")
require("ggplot2")
require("sp")
require("plyr")
gpclibPermit() # required for fortify method
require(UScensus2010)
require(UScensus2010cdp)
data(california.cdp10)
sf <- city(name = "san francisco", state="ca")
sf.points = fortify(sf)

I get the following error:

Using name to define regions.
Error in unionSpatialPolygons(cp, invert(polys)) : input lengths differ
In addition: Warning message:
In split(as.numeric(row.names(attr)), addNA(attr[, region], TRUE)) :
   NAs introduced by coercion

Does anybody know:

  1. What is a good value to give to the region parameter of fortify()?
  2. If that fails, a source of map data with untransformed lat/long coordinates for San Francisco that ggplot2 can draw?
  3. Alternatively, I found here another map of San Francisco, whose data is translated. Can you tell me how to either translate this data to raw lat/long or make the reverse translation for my set of points?

Solution

  • note:

    The issue

    The issue arises from the fact that fortify.SpatialPolygonsDataFrame relies on converting the row.names to numeric, and the rownames of your data are the identifiers.

    ggplot2:::fortify.SpatialPolygonsDataFrame 
    
    function (model, data, region = NULL, ...) 
    {
        attr <- as.data.frame(model)
        if (is.null(region)) {
            region <- names(attr)[1]
            message("Using ", region, " to define regions.")
        }
        polys <- split(as.numeric(row.names(attr)), addNA(attr[, 
            region], TRUE))
        cp <- polygons(model)
        try_require(c("gpclib", "maptools"))
        unioned <- unionSpatialPolygons(cp, invert(polys))
        coords <- fortify(unioned)
        coords$order <- 1:nrow(coords)
        coords
    }
    

    In your case

    row.names(sf@data)
    ## [1] "california_586" "california_590" "california_616"
    

    are the identifiers you wish to use as the region parameters, as place state and name do not uniquely identify the three polygons.

    # as.character used to coerce from factor
    lapply(lapply(sf@data[,c('place','state','name')], unique), as.character)
    ## $place
    ## [1] "67000"
    ## 
    ## $state
    ## [1] "06"
    ## 
    ## $name
    ## [1] "San Francisco"
    

    As a character vector where the elements begin with alphabetic characters, when coerced to numeric, it becomes NA

    as.numeric(rownames(sf@data))
    ## [1] NA NA NA
    ## Warning message:
    ## NAs introduced by coercion
    

    Which is one of the warnings given

    Solution

    1. Define a column to be the rownames
    2. Set the row.names to NULL or 1:nrow(sf@data)

    So..

    # rownames
    sf@data[['place_id']] <- rownames(sf@data)
    row.names(sf@data) <- NULL
    
    # fortify
    sf_ggplot <- fortify(sf, region = 'place_id')
    # merge to add the original data
    sf_ggplot_all <- merge(sf_ggplot, sf@data, by.x = 'id', by.y = 'place_id')
    # very basic and uninteresting plot
    ggplot(sf_ggplot_all,aes(x=long,y=lat, group = group)) + 
      geom_polygon(aes(fill =pop2000)) + 
      coord_map()
    

    enter image description here