Search code examples
rgroup-bymatchlookupimputation

R data imputation from group_by table


group = c(1,1,4,4,4,5,5,6,1,4,6)
animal = c('a','b','c','c','d','a','b','c','b','d','c')
sleep = c(14,NA,22,15,NA,96,100,NA,50,2,1)

test = data.frame(group, animal, sleep)
print(test)

group_animal = test %>% group_by(`group`, `animal`) %>% summarise(mean_sleep = mean(sleep, na.rm = T))

I would like to replace the NA values the sleep column based on the mean sleep value grouped by group and animal.

Is there any way that I can perform some sort of lookup like Excel that matches group and animal from the test dataframe to the group_animal dataframe and replaces the NA value in the sleep column from the test df with the sleep value in the group_animal df?


Solution

  • We could use mutate instead of summarise as summarise returns a single row per group

    library(dplyr)
    library(tidyr)
    test <- test %>% 
      group_by(group, animal) %>% 
      mutate(sleep = replace_na(sleep, mean(sleep, na.rm = TRUE))) %>%
      ungroup
    

    -output

    test
    # A tibble: 11 × 3
       group animal sleep
       <dbl> <chr>  <dbl>
     1     1 a         14
     2     1 b         50
     3     4 c         22
     4     4 c         15
     5     4 d          2
     6     5 a         96
     7     5 b        100
     8     6 c          1
     9     1 b         50
    10     4 d          2
    11     6 c          1