I'm looking for an efficient solution, preferably in data.table
, to compute leave-out means by group. To be precise, for each value of id
I want to compute the mean of the remaining id-values in each group. Following examples illustrates what I want:
group id value desired_output
a 1 10 17.5
a 2 15 15
a 3 20 12.5
b 4 10 20
b 4 15 20
b 5 20 12.5
df <- structure(list(group = c("a", "a", "a", "b", "b", "b"), id = c(1,
2, 3, 4, 4, 5), value = c(10, 15, 20, 10, 15, 20)), class = "data.frame", row.names = c(NA,
-6L))
How can I accomplish this?
Considering the definition of the mean:
df[, ":="(sum_group = sum(value), n_group = .N), by = group]
df[, desired_output := (sum_group - sum(value)) / (n_group - .N), by = id]
# group id value sum_group n_group desired_output
# <char> <num> <num> <num> <int> <num>
# 1: a 1 10 45 3 17.5
# 2: a 2 15 45 3 15.0
# 3: a 3 20 45 3 12.5
# 4: b 4 10 45 3 20.0
# 5: b 4 15 45 3 20.0
# 6: b 5 20 45 3 12.5