I have a dataframe with three columns, Subject, Rt1 and Rt2.
df<-data.frame(Subject = c(rep(1,15), rep(2,15)), RT1=rep(seq(100,1500,100),2), RT2 = rep(seq(200,3000,200),2))
For various reasons, I want to count the number/frequency of datapoints that fall on each cell of a 4*4 grid, for each subject. Here, the datapoints are the same for the 2 subjects, but for the actual data, it is not the case.
For example, let's say the value of my grid are :
x_axis <- c(220, 350, 700, 1250, 1800)
y_axis <- c(100, 700, 1400, 2500, 3500)
So basically I just want to get something that return me the number of point in each cell of the grid above, the desire output being something like :
df2<-data.frame("<x1"= c(1,0,0,0,0),
"x1-x2"= c(0,2,0,0,0),
"x2-x3"= c(0,0,4,0,0),
"x4-x5"= c(0,0,0,3,0)
)
Please note that this example is only for one subject (for clarity sake)
I do know how to do this, by simply counting the number of point in each "case", like :
df%>%
group_by(Subject)%>%
count(between(df$RT1,0,180),between(df$RT2 ,0,210))%>%
set_names(c("RT1","RT2","n"))%>%
filter(if_all(RT1:RT2, ~ . %in% TRUE))
However this is highly inefficient, as I should paste this code for each and every cell of all of my conditions.
I also have a solution using table and cut, but I don't know how to group by subject with this method :
table(
cut(df$RT2, c(0, y_axis), right = FALSE),
cut(df$RT1, c(0, x_axis), right = FALSE)
)
Update: If you can make it work grouped by Subject, that would be amazing!
df<-data.frame(Subject = c(rep(1,15), rep(2,15)), RT1=rep(seq(100,1500,100),2), RT2 = rep(seq(200,3000,200),2))
Steps:
cut()
function to break RT1
and RT2
into groups, according to the slices set out in x_axis
and y_axis
out <- df |>
mutate(x = cut(RT1, c(0, x_axis), labels = x_axis),
y = cut(RT2, y_axis, labels = y_axis[-1])) |>
count(x, y)
For multiple subjects, we literally add two words:
df |>
mutate(x = cut(RT1, c(0, x_axis), labels = x_axis),
y = cut(RT2, y_axis, labels = y_axis[-1]), .by = Subject) |>
count(x, y, Subject)
Output for original:
x y n
1 220 700 4
2 350 700 2
3 700 1400 8
4 1250 2500 10
5 1800 3500 6
If you insist on it being in the wider format, (despite it being untidy/bad practice to store data like this, and the zeroes just adding more clutter imo), you can use pivot_wider()
:
out |> pivot_wider(names_from= y, values_from = n, values_fill = 0)
Output:
# A tibble: 5 × 5
x `700` `1400` `2500` `3500`
<fct> <int> <int> <int> <int>
1 220 4 0 0 0
2 350 2 0 0 0
3 700 0 8 0 0
4 1250 0 0 10 0
5 1800 0 0 0 6
Notes: