Search code examples
rdplyrtidyr

Recode missing values in multiple columns: mutate with across and ifelse


I am working with an SPSS file that has been exported as tab delimited. In SPSS, you can set values to represent different types of missing and the dataset has 98 and 99 to indicate missing.

I want to convert them to NA but only in certain columns (V2 and V3 in the example data, leaving V1 and V4 unchanged).

library(dplyr)
testdf <- data.frame(V1 = c(1, 2, 3, 4),
                     V2 = c(1, 98, 99, 2),
                     V3 = c(1, 99, 2, 3),
                     V4 = c(98, 99, 1, 2))
outdf <- testdf %>% 
  mutate(across(V2:V3), . = ifelse(. %in% c(98,99), NA, .))

I haven't used across before and cannot work out how to have the mutate return the ifelse into the same columns. I suspect I am overthinking this, but can't find any similar examples that have both across and ifelse. I need a tidyverse answer, prefer dplyr or tidyr.


Solution

  • You need the syntax to be slightly different to make it work. Check ?across for more info.

    1. You need to use a ~ to make a valid function (or use \(.), or use function(.)),
    2. You need to include the formula in the across function
    library(dplyr)
    testdf %>% 
      mutate(across(V2:V3, ~ ifelse(. %in% c(98,99), NA, .)))
    
    #   V1 V2 V3 V4
    # 1  1  1  1 98
    # 2  2 NA NA 99
    # 3  3 NA  2  1
    # 4  4  2  3  2 
    

    Note that an alternative is replace:

    testdf %>% 
      mutate(across(V2:V3, ~ replace(., . %in% c(98,99), NA)))