Search code examples
rdplyrmissing-datazoo

Replace NA with previous or next value, by group, using dplyr


I have a data frame which is arranged by descending order of date.

ps1 = data.frame(userID = c(21,21,21,22,22,22,23,23,23), 
             color = c(NA,'blue','red','blue',NA,NA,'red',NA,'gold'), 
             age = c('3yrs','2yrs',NA,NA,'3yrs',NA,NA,'4yrs',NA), 
             gender = c('F',NA,'M',NA,NA,'F','F',NA,'F') 
)

I wish to impute(replace) NA values with previous values and grouped by userID In case the first row of a userID has NA then replace with the next set of values for that userid group.

I am trying to use dplyr and zoo packages something like this...but its not working

cleanedFUG <- filteredUserGroup %>%
 group_by(UserID) %>%
 mutate(Age1 = na.locf(Age), 
     Color1 = na.locf(Color), 
     Gender1 = na.locf(Gender) ) 

I need result df like this:

                      userID color  age gender
                1     21  blue 3yrs      F
                2     21  blue 2yrs      F
                3     21   red 2yrs      M
                4     22  blue 3yrs      F
                5     22  blue 3yrs      F
                6     22  blue 3yrs      F
                7     23   red 4yrs      F
                8     23   red 4yrs      F
                9     23  gold 4yrs      F

Solution

  • library(tidyr) #fill is part of tidyr
    
    ps1 %>% 
      group_by(userID) %>% 
      #fill(color, age, gender) %>% #default direction down
      fill(color, age, gender, .direction = "downup")
    

    Which gives you:

    Source: local data frame [9 x 4]
    Groups: userID [3]
    
      userID  color    age gender
       <dbl> <fctr> <fctr> <fctr>
    1     21   blue   3yrs      F
    2     21   blue   2yrs      F
    3     21    red   2yrs      M
    4     22   blue   3yrs      F
    5     22   blue   3yrs      F
    6     22   blue   3yrs      F
    7     23    red   4yrs      F
    8     23    red   4yrs      F
    9     23   gold   4yrs      F