Search code examples
rsummary

R - using dplyr to aggregate on a continuous variable


So I have a data frame of participant data, where I have participant IDs, for each of those a bunch of target values (continuous) and predicted values.

The target value is a continuous variable, but there is a finite number of possible values, and each participant will have made a prediction for a subset of these target values.

For example, take this data frame:

data.frame(
    subjectID = c(rep("p001",4),rep("p002",4),rep("p003",4)), 
    target = c(0.1,0.2,0.3,0.4,0.2,0.3,0.4,0.5,0.1,0.3,0.4,0.5),
    pred = c(0.12, 0.23, 0.31, 0.42, 0.18, 0.32, 0.44, 0.51, 0.09, 0.33, 0.41, 0.55)
)

There're 5 possible target values: 0.1, 0.2, 0.3, 0.4 and 0.5, but each participant only predicted 4 of these values each. I want to get the average prediction pred for each target value target. It's further complicated by each participant having a group, and I only want to average within each group.

I tried using summarise_at but it wasn't liking the continuous data, and whilst I'm pretty experienced in coding in R, it's been a long while since I've done data summary manipulations etc.

I could do this easily in a for loop, but I want to learn to do this properly and I wasn't able to find a solution after googling for a long time.

Thanks very much

H


Solution

  • Just add the second grouping variable in group_by as well:

    df <- data.frame(
      subjectID = c(rep("p001",4),rep("p002",4),rep("p003",4)), 
      group = c(rep("A", 8), rep("B", 4)),
      target = c(0.1,0.2,0.3,0.4,0.2,0.3,0.4,0.5,0.1,0.3,0.4,0.5),
      pred = c(0.12, 0.23, 0.31, 0.42, 0.18, 0.32, 0.44, 0.51, 0.09, 0.33, 0.41, 0.55)
    )
    
    df %>%
      group_by(target, group) %>%
      summarise(mean(pred))
    

    Output:

    # A tibble: 9 x 3
    # Groups:   target [?]
      target group `mean(pred)`
       <dbl> <chr>        <dbl>
    1  0.100 A           0.120 
    2  0.100 B           0.0900
    3  0.200 A           0.205 
    4  0.300 A           0.315 
    5  0.300 B           0.330 
    6  0.400 A           0.430 
    7  0.400 B           0.410 
    8  0.500 A           0.510 
    9  0.500 B           0.550