data.frame(
group = c("a", "b", "c", "d", "e", "total"),
count = c(NA, NA, 10, 21, 49, 85)
)
>
group count
1 a NA
2 b NA
3 c 10
4 d 21
5 e 49
6 total 85
Given the above data frame, how can I impute the NA values, so that
a-e
match total
A solution could either be generating a nested data frame of all possibilities, or replace NA
with a distribution or sth... Thanks!
One way would be to use RcppAlgos::permuteGeneral()
to generate all permutations that sum to the target. From there, a set can be selected at random to replace the NA
s.
library(RcppAlgos)
# Count NAs
n <- sum(is.na(dat$count))
# Find sum target
target <- dat$count[dat$group == "total"] - sum(dat$count[dat$group != "total"], na.rm = TRUE)
# Generate permutations of n values that sum to target
res <- permuteGeneral(
0:min(9, target), # Ensure all values are less than 10
n,
repetition = TRUE,
constraintFun = "sum",
comparisonFun = "==",
limitConstraints = target
)
# Permutations that meet the constraints:
res
[,1] [,2]
[1,] 0 5
[2,] 5 0
[3,] 1 4
[4,] 4 1
[5,] 2 3
[6,] 3 2
# Replace NA values with random permutation
dat$count[is.na(dat$count)] <- res[sample(nrow(res), 1), ]
dat
group count
1 a 3
2 b 2
3 c 10
4 d 21
5 e 49
6 total 85