Search code examples
rggplot2visualizationr-sfbeeswarm

R visualization: sensible repel points on map (beeswarm?)


I'm trying to replicate approximately a map like this. desc

It depicts a small number of items (schools) spread across an area. For input I have the map of areas with a number for each of them. I would like to lay that out into a that many points around the area. It would be even better if they wouldn't diffuse across area boundaries, but simply distributing them is enough. Some nice repel points within area might work.

Beeswarm plots do something quite similar, could this be done on a map. Bonus question - in fact I've been looking to animate this, but can only think of very complicated ways to do this, so that new points are added as sum nrs increase in time.

The code below places the points in centroids on the map, and takes the number as a size. (I was unable to export the map properly as a single file, so coordinates are a bit messed up, but principle is the same.)

places = st_read("https://gist.githubusercontent.com/peeter-t2/9646a4169e993948fa97f6f503a0688b/raw/cb4e910bf153e51e3727dc9d1c73dd9ef86d2556/kih1897m.geojson", stringsAsFactors = FALSE)

schools <- read_tsv("https://gist.github.com/peeter-t2/34467636b3c1017e89f33284d7907b42/raw/6ea7dd6c005ef8577b36f5e84338afcb6c76b707/school_nums.tsv")
schools_geo <- merge(places,schools,by.x="KIHELKOND",by.y="Kihelkond") #94 matches

p<- schools_geo %>% 
  ggplot()+
  geom_sf(data=schools_geo)+
  geom_sf(data=st_centroid(schools_geo),aes(size=value))+
  theme_bw()
p

Thanks!


Solution

  • As I noted in my comments, the when I read in the file it is setting the crs to lat/lon (epsg: 4326) while the geometry column is a different crs. I have guessed that the correct crs is espg: 3301 and proceeded on that basis which seems to work fine.

    st_crs(schools_geo) <- 3301
    

    We can use st_sample to get a sample of points within the polygons in relation to our 'value' column:

    # we can set type = 'hexagonal', 'regular' or 'random'
    school_pts <- schools_geo %>% st_sample(size = .$value, type = 'hexagonal')
    
    
    schools_geo %>% 
      ggplot()+
      geom_sf()+
      geom_sf(data=school_pts, size = .8)+
      theme_bw()
    

    This produces the following plot which I think looks messy due to the fact st_sample spreads the points out to the extents of the polygons.

    enter image description here

    It might look nicer to have the points more centered in each of the polygons like in the example you posted. To do that we could rescale the polygons depending on the number of points we want to plot within them. In the code below, I shrink the polygons by 90% if they have the least points inside (1) and by 20% if they have the most points (27).

    # put values on scale between 0 and 1
    scale_fact <- (max(schools_geo$value) -  schools_geo$value) / (max(schools_geo$value) - min(schools_geo$value)) 
    # re-scale between 0.2 and 0.9
    scale_fact <- scale_fact * (0.9 - 0.2) + 0.2
    # reverse the scale 
    scale_fact <-  max(scale_fact) + min(scale_fact) - scale_fact 
    
    # apply the scale factor
    schools_centroid <- st_geometry(st_centroid(schools_geo))
    schools_geo_rescaled <- (st_geometry(schools_geo) - schools_centroid) * scale_fact + schools_centroid
    
    school_pts <- schools_geo_rescaled %>% 
      st_sf(crs = 3301) %>% 
      bind_cols(value = schools_geo$value) %>%
      st_sample(size = .$value, type = 'hexagonal')
    
    
    # plot
    schools_geo %>% 
      ggplot()+
      geom_sf()+
      geom_sf(data=school_pts, size = .8)+
      theme_bw()
    

    enter image description here