Search code examples
rsapply

How to get switch() to handle NA?


Okay, I have to recode a df, because I want factors as integers:

library(dplyr)

load(url('http://biostat.mc.vanderbilt.edu/wiki/pub/Main/DataSets/crash2.rda'))

df <- crash2 %>% select(source, sex)

df$source <- sapply(df$source, switch, "telephone" = 1, "telephone entered manually" = 2, "electronic CRF by email" = 3, "paper CRF enteredd in electronic CRF" = 4, "electronic CRF" = 5, NA)

This works as intended, but there are NAs in the next variable (sex) and things get complicated:

df$sex <- sapply(df$sex, switch, "male" = 1, "female" = 2, NA)

returns a list with NAs switched to oblivion. Use unlist() returns a vector that is too short for the df.

length(unlist(sapply(df$sex, switch, "male" = 1, "female" = 2, NA)))

should be 20207, but is 20206.

What I want is a vector matching the df by returning NAs as NAs.

Besides a working solution I would be extra thankful for an explanation where I went wrong and how the code actually works.

Edit: Thank you for all your answers. As is so often the case, there is an even more efficent solution I should have noticed myself (well, I noticed it by myself, but too late, obviously):

>str(df$sex)
Factor w/ 2 levels "male","female": 1 2 1 1 2 1 1 1 1 1 ...

So I can just use as.numeric() to get what I want.


Solution

  • If you're interested, there's also a dplyr way of doing this with case_when():

    load(url('http://biostat.mc.vanderbilt.edu/wiki/pub/Main/DataSets/crash2.rda'))
    
    df <- crash2 %>% dplyr::select(source, sex) %>% 
      mutate(source = case_when(
        source == "telephone"~1, 
        source == "telephone entered manually"~2, 
        source == "electronic CRF by email"~3, 
        source == "paper CRF enteredd in electronic CRF"~4, 
        source == "electronic CRF"~5), 
        sex = case_when(
          sex == "male" ~ 1, 
          sex == "female" ~ 2))
    
    table(df$sex, useNA="ifany")
    #     1     2  <NA> 
    # 16935  3271     1