Search code examples
rggplot2shapefilergdal

Wrong districts filled on state map plot


I have a shapfile of school districts in Texas and am trying to use ggplot2 to highlight 10 in particular. I've tinkered with it and gotten everything set up, but when I spot checked it I realized the 10 districts highlighted are not in fact the ones I want to be highlighted.

The shapefile can be downloaded from this link to the Texas Education Agency Public Open Data Site.

#install.packages(c("ggplot2", "rgdal"))
library(ggplot2)
library(rgdal)
#rm(list=ls())

#setwd("path")

# read shapefile
tex <- readOGR(dsn = paste0(getwd(), "/Current_Districts/Current_Districts.shp")

# colors to use and districts to highlight
cols<- c("#CCCCCC", "#003082")
districts <- c("Aldine", "Laredo", "Spring Branch", "United", "Donna", "Brownsville", "Houston", "Bryan", "Galena Park", "San Felipe-Del Rio Cons")

# extract from shapefile data just the name and ID, then subset to only the districts of interest
dist_info <- data.frame(cbind(as.character(tex@data$NAME2), as.character(tex@data$FID)), stringsAsFactors=FALSE)
names(dist_info) <- c("name", "id")
dist_info <- dist_info[dist_info$name %in% districts, ]

# turn shapefile into df
tex_df <- fortify(tex)

# create dummy fill var for if the district is one to be highlighted
tex_df$yes <- as.factor(ifelse(tex_df$id %in% dist_info$id, 1, 0))


# plot the graph
ggplot(data=tex_df) +
  geom_polygon(aes(x=long, y=lat, group=group, fill=yes), color="#CCCCCC") + 
  scale_fill_manual(values=cols) +
  theme_void() +
  theme(legend.position = "none")

As you'll see, when the plot gets created it looks like it's done exactly what I want. The problem is, those ten districts highlighted are not hte ones in the districts vector above. I've re-ran everything clean numerous times, double checked that I wasn't having a factor/character conversion issue, and double checked within the web data explorer that the IDs that I get from the shapefile are indeed the ones that should match with my list of names. I really have no idea where this issue could be coming from.

This is my first time working with shapefiles and rgdal so if I had to guess there's something simple about the structure that I don't understand and hopefully one of you can quickly point it out for me. Thanks!

Here's the output:

enter image description here


Solution

  • Alternative 1

    With the fortify function add the argument region specifying "NAME2", the column id will include your district names then. Then create your dummy fill variable based on that column. I am not familiar with Texas districts, but I assume the result is right.

    tex <- tex <- readOGR(dsn = paste0(getwd(), "/Current_Districts/Current_Districts.shp"))
    
    # colors to use and districts to highlight
    cols<- c("#CCCCCC", "#003082")
    districts <- c("Aldine", "Laredo", "Spring Branch", "United", "Donna", "Brownsville", "Houston", "Bryan", "Galena Park", "San Felipe-Del Rio Cons")
    
    # turn shapefile into df
    tex_df <- fortify(tex, region = "NAME2")
    
    # create dummy fill var for if the district is one to be highlighted
    tex_df$yes <- as.factor(ifelse(tex_df$id %in% districts, 1, 0))
    
    # plot the graph
    ggplot(data=tex_df) +
    geom_polygon(aes(x=long, y=lat, group=group, fill=yes), color="#CCCCCC") +
    scale_fill_manual(values=cols) +
    theme_void() +
    theme(legend.position = "none")
    

    enter image description here

    Alternative 2

    Without passing the argument region to fortify function. Addressing seeellayewhy's issue implementing previous alternative. We add two layers, no need to create dummy variable or merge any data frame.

    tex <- tex <- readOGR(dsn = paste0(getwd(), "/Current_Districts/Current_Districts.shp"))
    
    # colors to use and districts to highlight
    cols<- c("#CCCCCC", "#003082")
    districts <- c("Aldine", "Laredo", "Spring Branch", "United", "Donna", "Brownsville", "Houston", "Bryan", "Galena Park", "San Felipe-Del Rio Cons")
    
     # Subset the shape file into two
    tex1 <- subset(tex, NAME2 %in% districts)
    tex2 <- subset(tex, !(NAME2 %in% districts)) 
    
    # Create two data frames
    tex_df1 <- fortify(tex1)
    tex_df2 <- fortify(tex2)
    
    # Plot two geom_polygon layers, one for each data frame
    ggplot() +
      geom_polygon(data = tex_df1, 
                   aes(x = long, y = lat, group = group, fill = "#CCCCCC"), 
                   color = "#CCCCCC")+
      geom_polygon(data = tex_df2, 
                   aes(x = long, y = lat, group = group, fill ="#003082")) + 
        scale_fill_manual(values=cols) +
      theme_void() +
      theme(legend.position = "none")