Search code examples
rsapplyimputation

How to impute values from groups in a category with `sapply()`?


I want to impute missings in val of all ctry in cat1 with particular ctry means.

Data example

set.seed(654)
df1 <- data.frame(
  year=rep(2000:2005, each=5),
  ctry=rep(LETTERS[1:5], 6),
  val=rnorm(30)
)
df1$cat <- ifelse(df1$ctry %in% c("A", "B"), 1, 0)
df1[sample(nrow(df1), 12), "val"] <- NA
> head(df1)
  year ctry         val cat
1 2000    A -0.76031762   1
2 2000    B -0.38970450   1
3 2000    C  1.68962523   0
4 2000    D          NA   0
5 2000    E  0.09530146   0
6 2001    A          NA   1

First, I get the names of ctry in cat1 and allocate their means.

cat1 <- as.character(sort(unique(
  df1[!is.na(df1$val) & df1$cat == 1, ]
  [, 2])))
cat1 <- sapply(cat1, function(x) mean(df1$val[df1$ctry == x], na.rm=TRUE))
> cat1
        A         B 
0.4372003 0.4792314 

Now I succeed in manually imputing country by country:

df2 <- df1
df2$val[df2$ctry %in% names(cat1)[1] & is.na(df2$val)] <- cat1[1]
> head(df2)
  year ctry         val cat
1 2000    A -0.76031762   1
2 2000    B -0.38970450   1
3 2000    C  1.68962523   0
4 2000    D          NA   0
5 2000    E  0.09530146   0
6 2001    A -0.49758245   1

But for some reason I can't bring this sapply() to work, to do the imputation automatically:

> sapply(seq_along(cat1), 
+        function(x) df2$val[df2$ctry %in% names(cat1)[x] & is.na(df2$val)] <- cat1[x])
         A          B 
-0.4975825 -0.6139364 

The expected output would be a whole data frame with particular imputed means of the countries in category cat1.


Solution

  • In Base R:

    set.seed(654)
    df1 <- data.frame(
      year=rep(2000:2005, each=5),
      ctry=rep(LETTERS[1:5], 6),
      val=rnorm(30)
    )
    df1$cat <- ifelse(df1$ctry %in% c("A", "B"), 1, 0)
    df1[sample(nrow(df1), 12), "val"] <- NA
    
    # want:
    my_means <- tapply(df1$val, df1$ctry, mean, na.rm = TRUE)
    df1$val <- ifelse(is.na(df1$val), my_means[df1$ctry], df1$val)