Search code examples
rggplot2binning

Summarise X,Y,theta data in 2D bins before passing to `geom_spoke`


I have data consisting of x,y-coordinates and heading angle that I'd like to divide into 2D bins in order to calculate mean heading for each bin and plot with ggplot's geom_spoke.

Here's an example of what I want to do, with bins created manually:

# data
set.seed(1)
dat <- data.frame(x = runif(100,0,100), y = runif(100,0,100), angle = runif(100, 0, 2*pi))

# manual binning
bins <- rbind(
  #bottom left
  dat %>%
    filter(x < 50 & y < 50) %>%
    summarise(x = 25, y = 25, angle = mean(angle), n = n()),
  #bottom right
  dat %>%
    filter(x > 50 & y < 50) %>%
    summarise(x = 75, y = 25, angle = mean(angle), n = n()),
  #top left
  dat %>%
    filter(x < 50 & y > 50) %>%
    summarise(x = 25, y = 75, angle = mean(angle), n = n()),
  #top right
  dat %>%
    filter(x > 50 & y > 50) %>%
    summarise(x = 75, y = 75, angle = mean(angle), n = n())
)

# plot
ggplot(bins, aes(x, y)) +
  geom_point() +
  coord_equal() +
  scale_x_continuous(limits = c(0,100)) +
  scale_y_continuous(limits = c(0,100)) +
  geom_spoke(aes(angle = angle, radius = n/2), arrow=arrow(length = unit(0.2,"cm")))

Desired geom_spoke plot I know how to create 2D bins containing count data for each bin, e.g.:

# heatmap of x,y counts
p <- ggplot(dat, aes(x, y)) +
  geom_bin2d(binwidth = c(50, 50)) +
  coord_equal()
#ggplot_build(p)$data[[1]] #access binned data

But I can't seem to find a way to summarise other variables such as heading for each bin before passing to geom_spoke. Without first binning, my plot looks like this instead: Bad geom_spoke plot


Solution

  • Here's one approach. You'll need to determine the number / range of bins in each dimension (x & y) once, & everything else should be covered by code:

    # adjust range & number of bins here
    x.range <- pretty(dat$x, n = 3)
    y.range <- pretty(dat$y, n = 3)
    
    > x.range
    [1]   0  50 100
    > y.range
    [1]   0  50 100
    

    Automatically assign each row to a bin based on which x & y intervals it falls into:

    dat <- dat %>%
      rowwise() %>%
      mutate(x.bin = max(which(x > x.range)),
             y.bin = max(which(y > y.range)),
             bin = paste(x.bin, y.bin, sep = "_")) %>%
      ungroup()
    
    > head(dat)
    # A tibble: 6 x 6
             x        y    angle x.bin y.bin   bin
         <dbl>    <dbl>    <dbl> <int> <int> <chr>
    1 26.55087 65.47239 1.680804     1     2   1_2
    2 37.21239 35.31973 1.373789     1     1   1_1
    3 57.28534 27.02601 3.247130     2     1   2_1
    4 90.82078 99.26841 1.689866     2     2   2_2
    5 20.16819 63.34933 1.138314     1     2   1_2
    6 89.83897 21.32081 3.258310     2     1   2_1
    

    Calculate the mean values for each bin:

    dat <- dat %>%
      group_by(bin) %>%
      mutate(x.mean = mean(x),
             y.mean = mean(y),
             angle.mean = mean(angle),
             n = n()) %>%
      ungroup()
    
    > head(dat)
    # A tibble: 6 x 10
             x        y    angle x.bin y.bin   bin   x.mean   y.mean angle.mean     n
         <dbl>    <dbl>    <dbl> <int> <int> <chr>    <dbl>    <dbl>      <dbl> <int>
    1 26.55087 65.47239 1.680804     1     2   1_2 26.66662 68.56461   2.672454    29
    2 37.21239 35.31973 1.373789     1     1   1_1 33.05887 28.86027   2.173177    23
    3 57.28534 27.02601 3.247130     2     1   2_1 74.71214 24.99131   3.071629    23
    4 90.82078 99.26841 1.689866     2     2   2_2 77.05622 77.91031   3.007859    25
    5 20.16819 63.34933 1.138314     1     2   1_2 26.66662 68.56461   2.672454    29
    6 89.83897 21.32081 3.258310     2     1   2_1 74.71214 24.99131   3.071629    23
    

    Plot without hard-coding any bin number / bin width:

    ggplot(dat,
           aes(x, y, fill = bin)) +
      geom_bin2d(binwidth = c(diff(x.range)[1], 
                              diff(y.range)[1])) +
      geom_point(aes(x = x.mean, y = y.mean)) +
      geom_spoke(aes(x = x.mean, y = y.mean, angle = angle.mean, radius = n/2),
                 arrow=arrow(length = unit(0.2,"cm"))) +
      coord_equal()
    

    ggplot

    Other details such as the choice of fill palette, legend label, plot title, etc can be tweaked subsequently.