Search code examples
rdplyrdatatablemissing-dataimputation

Replace NA in a series of variables by a factor level


This is my data, and I want to replace NA with "No". I can replace missing values one by one. However, I need to replace NAs in s_1:s_4 in the code. Just as a reminder, all of the variables are factor levels.

id  x   s_0 s_1 s_2 s_3
1   5   75  A   4   110
2   9   36  NA  NA  921
3   11  13  B   7   769
4   11  34  C   2   912
5   11  NA  C   NA  835
6   13  39  NA  4   NA
7   14  45  B   4   577
8   19  42  D   6   NA
9   20  4   NA  7   577
10  13  28  NA  3   573 


Solution

  • If these are already existing factors, you can use forcats::fct_na_value_to_level():

    library(dplyr)
    library(forcats)
    
    # Make sample data vars factors
    dat <- dat %>%
      mutate(across(starts_with("s_"), as.factor))
    
    # Add 'No' as factor level
    dat %>%
      mutate(across(starts_with("s_"), ~ fct_na_value_to_level(.x, "No")))
    
    # A tibble: 10 x 6
          id     x s_0   s_1   s_2   s_3  
       <dbl> <dbl> <fct> <fct> <fct> <fct>
     1     1     5 75    A     4     110  
     2     2     9 36    No    No    921  
     3     3    11 13    B     7     769  
     4     4    11 34    C     2     912  
     5     5    11 No    C     No    835  
     6     6    13 39    No    4     No   
     7     7    14 45    B     4     577  
     8     8    19 42    D     6     No   
     9     9    20 4     No    7     577  
    10    10    13 28    No    3     573