Search code examples
rdata.tableaggregatemeangroup

R data table leave out mean by group


I'm looking for an efficient solution, preferably in data.table, to compute leave-out means by group. To be precise, for each value of id I want to compute the mean of the remaining id-values in each group. Following examples illustrates what I want:

group id value desired_output
a     1   10   17.5
a     2   15   15
a     3   20   12.5
b     4   10   20
b     4   15   20
b     5   20   12.5
df <- structure(list(group = c("a", "a", "a", "b", "b", "b"), id = c(1, 
2, 3, 4, 4, 5), value = c(10, 15, 20, 10, 15, 20)), class = "data.frame", row.names = c(NA, 
-6L))

How can I accomplish this?


Solution

  • Considering the definition of the mean:

    df[, ":="(sum_group = sum(value), n_group = .N), by = group]
    
    df[, desired_output := (sum_group - sum(value)) / (n_group - .N), by = id]
    
    #     group    id value sum_group n_group desired_output
    #    <char> <num> <num>     <num>   <int>          <num>
    # 1:      a     1    10        45       3           17.5
    # 2:      a     2    15        45       3           15.0
    # 3:      a     3    20        45       3           12.5
    # 4:      b     4    10        45       3           20.0
    # 5:      b     4    15        45       3           20.0
    # 6:      b     5    20        45       3           12.5