Search code examples
rdplyrgroup-bynaimputation

replacing NA with next available number within a group


I have a relatively large dataset and I want to replace NA value for the price in a specific year and for a specific ID number with an available value in next year within a group for the same ID number. Here is a reproducible example:

ID <- c(1,2,3,2,2,3,1,4,5,5,1,2,2)
year <- c(2000,2001,2002,2002,2003,2007,2001,2000,2005,2006,2002,2004,2005)
value <- c(1000,20000,30000,NA,40000,NA,6000,4000,NA,20000,7000,50000,60000)
data <- data.frame(ID, year, value)

   ID year value
1   1 2000  1000
2   2 2001 20000
3   3 2002 30000
4   2 2002    NA
5   2 2003 40000
6   3 2007    NA
7   1 2001  6000
8   4 2000  4000
9   5 2005    NA
10  5 2006 20000
11  1 2002  7000
12  2 2004 50000
13  2 2005 60000

So, for example for ID=2 we have following value and years:

   ID year value
   2 2001  20000
   2 2002  NA
   2 2003  40000
   2 2004  50000
   2 2005  60000

So in the above case, NA should be replaced with 40000 (Values in next year). And the same story for other IDs. the final result should be in this form:

   ID year value
   1 2000  1000
   1 2001  6000
   1 2002  7000
   2 2001  20000
   2 2002  40000
   2 2003  40000
   2 2004  50000
   2 2005  60000
   3 2007    NA
   4 2000  4000
   5 2005  20000
   5 2006  20000

Please note that for ID=3 since there is no next year available, we want to leave it as is. That's why it's in the form of NA

I appreciate if you can suggest a solution
Thanks


Solution

  • dplyr solution

    library(tidyverse)
    
    data2 <- data %>%
      dplyr::group_by(ID) %>%
      dplyr::arrange(year) %>% 
      dplyr::mutate(replaced_value = ifelse(is.na(value), lead(value), value))
    
    print(data2)
    # A tibble: 13 x 4
    # Groups:   ID [5]
          ID  year value replaced_value
       <dbl> <dbl> <dbl>          <dbl>
     1     1  2000  1000           1000
     2     4  2000  4000           4000
     3     2  2001 20000          20000
     4     1  2001  6000           6000
     5     3  2002 30000          30000
     6     2  2002    NA          40000
     7     1  2002  7000           7000
     8     2  2003 40000          40000
     9     2  2004 50000          50000
    10     5  2005    NA          20000
    11     2  2005 60000          60000
    12     5  2006 20000          20000
    13     3  2007    NA             NA