Search code examples
rif-statementmissing-dataimputation

r impute missing data in two columns


I have a dataset like this.

  ID   Yr    Month
  1    3     NA
  2    4     23
  3    NA    46
  4    1     19
  5    NA    NA

I like to create a new column , Age where

 Case1 : Age = Year,  if Month is missing
 Case2 : Age = Year + Month/12 , if Year and Month are not missing
 Case3 : Age = Month/12 , if Year is missing
 Case4 : Age = NA, if both Year and Month are missing.

The final expected dataset should look like this.

  ID   Yr    Month   Age
  1    3     NA      3
  2    4     23      5.91
  3    NA    46      3.83
  4    1     19      2.58 
  5    NA    NA      NA

I am able to accomplish this with 30 lines of code, but I am looking for a simple and efficient solution to this problem. Any suggestions , much appreciated, thanks in advance.


Solution

  • You may include the conditions in case_when statement.

    library(dplyr)
    
    df %>%
      mutate(Age = case_when(is.na(Month) & is.na(Yr) ~ NA_real_, 
                             is.na(Month) ~ as.numeric(Yr), 
                             is.na(Yr) ~ Month/12, 
                             TRUE ~ Yr + Month/12))
    
    #  ID Yr Month      Age
    #1  1  3    NA 3.000000
    #2  2  4    23 5.916667
    #3  3 NA    46 3.833333
    #4  4  1    19 2.583333
    #5  5 NA    NA       NA