Search code examples
rdataframedplyrcountgrid

Count the number of datapoints in each "cell" of a grid


I have a dataframe with three columns, Subject, Rt1 and Rt2.

df<-data.frame(Subject = c(rep(1,15), rep(2,15)), RT1=rep(seq(100,1500,100),2), RT2 = rep(seq(200,3000,200),2))

For various reasons, I want to count the number/frequency of datapoints that fall on each cell of a 4*4 grid, for each subject. Here, the datapoints are the same for the 2 subjects, but for the actual data, it is not the case.

For example, let's say the value of my grid are :

x_axis <- c(220, 350, 700, 1250, 1800)
y_axis <- c(100, 700, 1400, 2500, 3500)

So basically I just want to get something that return me the number of point in each cell of the grid above, the desire output being something like :

df2<-data.frame("<x1"= c(1,0,0,0,0),
                "x1-x2"= c(0,2,0,0,0),
                "x2-x3"= c(0,0,4,0,0),
                "x4-x5"= c(0,0,0,3,0)
                )

Please note that this example is only for one subject (for clarity sake)

I do know how to do this, by simply counting the number of point in each "case", like :

df%>%
  group_by(Subject)%>%
  count(between(df$RT1,0,180),between(df$RT2 ,0,210))%>%
  set_names(c("RT1","RT2","n"))%>%
  filter(if_all(RT1:RT2, ~ . %in% TRUE))

However this is highly inefficient, as I should paste this code for each and every cell of all of my conditions.

I also have a solution using table and cut, but I don't know how to group by subject with this method :

table(
  cut(df$RT2, c(0, y_axis), right = FALSE),
  cut(df$RT1, c(0, x_axis), right = FALSE)
)

Update: If you can make it work grouped by Subject, that would be amazing!

df<-data.frame(Subject = c(rep(1,15), rep(2,15)), RT1=rep(seq(100,1500,100),2), RT2 = rep(seq(200,3000,200),2))

Solution

  • Steps:

    1. Use the cut() function to break RT1 and RT2 into groups, according to the slices set out in x_axis and y_axis
    2. Get the count of each x and y group we just created.
    out <- df |> 
      mutate(x = cut(RT1, c(0, x_axis), labels = x_axis),
                y = cut(RT2, y_axis, labels = y_axis[-1])) |>
                count(x, y)
    

    For multiple subjects, we literally add two words:

    df |> 
      mutate(x = cut(RT1, c(0, x_axis), labels = x_axis),
                y = cut(RT2, y_axis, labels = y_axis[-1]), .by = Subject) |>
                count(x, y, Subject)
    

    Output for original:

         x    y  n
    1  220  700  4
    2  350  700  2
    3  700 1400  8
    4 1250 2500 10
    5 1800 3500  6
    

    If you insist on it being in the wider format, (despite it being untidy/bad practice to store data like this, and the zeroes just adding more clutter imo), you can use pivot_wider():

    out |> pivot_wider(names_from= y, values_from = n, values_fill = 0)
    

    Output:

    # A tibble: 5 × 5
      x     `700` `1400` `2500` `3500`
      <fct> <int>  <int>  <int>  <int>
    1 220       4      0      0      0
    2 350       2      0      0      0
    3 700       0      8      0      0
    4 1250      0      0     10      0
    5 1800      0      0      0      6
    

    Notes:

    • My table is slightly different to your table, because it appears that the cuts in your y_axis are slightly different from the ones in your screenshot (and presumably your output). The smallest one is 100 in your input, but in the plot it's around 220.