Search code examples
rimputation

How impute NA values or create all possible combinations?


data.frame(
  group = c("a", "b", "c", "d", "e", "total"),
  count = c(NA, NA, 10, 21, 49, 85)
)
> 
  group count
1     a    NA
2     b    NA
3     c    10
4     d    21
5     e    49
6   total  85

Given the above data frame, how can I impute the NA values, so that

  1. the totals of a-e match total
  2. each imputed NA is <10?

A solution could either be generating a nested data frame of all possibilities, or replace NA with a distribution or sth... Thanks!


Solution

  • One way would be to use RcppAlgos::permuteGeneral() to generate all permutations that sum to the target. From there, a set can be selected at random to replace the NAs.

    library(RcppAlgos)
    
    # Count NAs 
    n <- sum(is.na(dat$count))
    
    # Find sum target
    target <- dat$count[dat$group == "total"] - sum(dat$count[dat$group != "total"], na.rm = TRUE)
    
    # Generate permutations of n values that sum to target
    res <- permuteGeneral(
      0:min(9, target),  # Ensure all values are less than 10
      n,
      repetition = TRUE,
      constraintFun = "sum",
      comparisonFun = "==",
      limitConstraints = target
      )
    
    # Permutations that meet the constraints:
    res
         [,1] [,2]
    [1,]    0    5
    [2,]    5    0
    [3,]    1    4
    [4,]    4    1
    [5,]    2    3
    [6,]    3    2
    
    # Replace NA values with random permutation
    dat$count[is.na(dat$count)] <- res[sample(nrow(res), 1), ]
    
    dat
      group count
    1     a     3
    2     b     2
    3     c    10
    4     d    21
    5     e    49
    6 total    85