So I have a data frame of participant data, where I have participant IDs, for each of those a bunch of target values (continuous) and predicted values.
The target value is a continuous variable, but there is a finite number of possible values, and each participant will have made a prediction for a subset of these target values.
For example, take this data frame:
data.frame(
subjectID = c(rep("p001",4),rep("p002",4),rep("p003",4)),
target = c(0.1,0.2,0.3,0.4,0.2,0.3,0.4,0.5,0.1,0.3,0.4,0.5),
pred = c(0.12, 0.23, 0.31, 0.42, 0.18, 0.32, 0.44, 0.51, 0.09, 0.33, 0.41, 0.55)
)
There're 5 possible target values: 0.1, 0.2, 0.3, 0.4 and 0.5, but each participant only predicted 4 of these values each. I want to get the average prediction pred
for each target value target
. It's further complicated by each participant having a group, and I only want to average within each group.
I tried using summarise_at
but it wasn't liking the continuous data, and whilst I'm pretty experienced in coding in R, it's been a long while since I've done data summary manipulations etc.
I could do this easily in a for loop, but I want to learn to do this properly and I wasn't able to find a solution after googling for a long time.
Thanks very much
H
Just add the second grouping variable in group_by
as well:
df <- data.frame(
subjectID = c(rep("p001",4),rep("p002",4),rep("p003",4)),
group = c(rep("A", 8), rep("B", 4)),
target = c(0.1,0.2,0.3,0.4,0.2,0.3,0.4,0.5,0.1,0.3,0.4,0.5),
pred = c(0.12, 0.23, 0.31, 0.42, 0.18, 0.32, 0.44, 0.51, 0.09, 0.33, 0.41, 0.55)
)
df %>%
group_by(target, group) %>%
summarise(mean(pred))
Output:
# A tibble: 9 x 3
# Groups: target [?]
target group `mean(pred)`
<dbl> <chr> <dbl>
1 0.100 A 0.120
2 0.100 B 0.0900
3 0.200 A 0.205
4 0.300 A 0.315
5 0.300 B 0.330
6 0.400 A 0.430
7 0.400 B 0.410
8 0.500 A 0.510
9 0.500 B 0.550