Search code examples
rgeolocationggplot2ggmap

ggmap visualization of data with circles on map


I am trying to create a map that shows in circles the cities where subjects in my data set originated. I would like the circles to be proportional to the number of people in the city in my data. I would also like an additional circle to be a subset of the original circle showing the people in each city afflicted by the disease.

I have started doing this with ggmap by getting longitudes and latitudes:

library(ggplot2) 
library(maps)
library(ggmap)
geocode("True Blue, Grenada")

I'm stuck because I don't know how to continue. I can't load the US map alone because there is one location in the Caribbean.

here is my data in short format, the actual data set is far too large.

subjectid   location            disease
12          Atlanta, GA         yes
15          Boston, MA          no
13          True Blue, Grenada  yes
85          True Blue, Grenada  yes
46          Atlanta, GA         yes
569         Boston, MA          yes
825         True Blue, Grenada  yes
685         Atlanta, GA         no
54          True Blue, Grenada  no
214         Atlanta, GA         no
685         Boston, MA          no
125         True Blue, Grenada  yes
569         Boston, MA          no

can someone please help?


Solution

  • This should get you started. It does not plot circles within circles. ggplot can be made to map different variables to the same aesthetic (size), but with difficulty. Here, the size of the point represents the total count, and the colour of the point represents the number diseased. You will need to adjust the size scale for your full set of data.

    The code below gets the geographic locations of the cities then merges them back into the data files. Then it summarises the data to give a data frame containing the required counts. The map is drawn with boundaries set by the maximum and minimum lon and lat of the cities. The last step is to plot the cities and the counts on the map.

    # load libraries
    library(ggplot2) 
    library(maps)
    library(ggmap)
    library(grid)
    library(plyr)
    
    # Your data
    df <- read.table(header = TRUE, text = "
    subjectid   location           disease
    12          'Atlanta, GA'         yes
    15          'Boston, MA'          no
    13          'True Blue, Grenada'  yes
    85          'True Blue, Grenada'  yes
    46          'Atlanta, GA'         yes
    569         'Boston, MA'          yes
    825         'True Blue, Grenada'  yes
    685         'Atlanta, GA'         no
    54          'True Blue, Grenada'  no
    214         'Atlanta, GA'         no
    685         'Boston, MA'          no
    125         'True Blue, Grenada'  yes
    569         'Boston, MA'          no", stringsAsFactors = FALSE)
    
    # Get geographic locations and merge them into the data file
    geoloc <- geocode(unique(df$location))
    pos <- data.frame(location = unique(df$location), geoloc, stringsAsFactors = FALSE)
    df <- merge(df, pos, by = "location", all = TRUE)
    
    # Summarise the data file
    df = ddply(df, .(location, lon, lat), summarise, 
       countDisease = sum(ifelse(disease == "yes", 1, 0)),
       countTotal = length(location))
    
    # Plot the map
    mp1 <- fortify(map(fill = TRUE, plot = FALSE))
    
    xmin <- min(df$lon) - 5
    xmax <- max(df$lon) + 7
    ymin <- min(df$lat) - 5
    ymax <- max(df$lat) + 5
    
    Amap <- ggplot() + 
      geom_polygon(aes(x = long, y = lat, group = group), data = mp1, fill = "grey", colour = "grey") + 
      coord_cartesian(xlim = c(xmin, xmax), ylim = c(ymin, ymax)) + 
      theme_bw()
    
    # Plot the cities and counts 
    Amap <- Amap + geom_point(data = df, aes(x = lon, y = lat, size = countTotal, colour = countDisease)) +
        geom_text(data = df, aes(x = lon, y = lat, label = gsub(",.*$", "", location)), size = 2.5,  hjust = -.3) +
        scale_size(range = c(3, 10)) +
        scale_colour_continuous(low = "blue", high = "red", space = "Lab")
    

    enter image description here