Okay, I have to recode a df, because I want factors as integers:
library(dplyr)
load(url('http://biostat.mc.vanderbilt.edu/wiki/pub/Main/DataSets/crash2.rda'))
df <- crash2 %>% select(source, sex)
df$source <- sapply(df$source, switch, "telephone" = 1, "telephone entered manually" = 2, "electronic CRF by email" = 3, "paper CRF enteredd in electronic CRF" = 4, "electronic CRF" = 5, NA)
This works as intended, but there are NAs in the next variable (sex) and things get complicated:
df$sex <- sapply(df$sex, switch, "male" = 1, "female" = 2, NA)
returns a list with NAs switched to oblivion. Use unlist()
returns a vector that is too short for the df.
length(unlist(sapply(df$sex, switch, "male" = 1, "female" = 2, NA)))
should be 20207
, but is 20206
.
What I want is a vector matching the df by returning NAs as NAs.
Besides a working solution I would be extra thankful for an explanation where I went wrong and how the code actually works.
Edit: Thank you for all your answers. As is so often the case, there is an even more efficent solution I should have noticed myself (well, I noticed it by myself, but too late, obviously):
>str(df$sex)
Factor w/ 2 levels "male","female": 1 2 1 1 2 1 1 1 1 1 ...
So I can just use as.numeric()
to get what I want.
If you're interested, there's also a dplyr
way of doing this with case_when()
:
load(url('http://biostat.mc.vanderbilt.edu/wiki/pub/Main/DataSets/crash2.rda'))
df <- crash2 %>% dplyr::select(source, sex) %>%
mutate(source = case_when(
source == "telephone"~1,
source == "telephone entered manually"~2,
source == "electronic CRF by email"~3,
source == "paper CRF enteredd in electronic CRF"~4,
source == "electronic CRF"~5),
sex = case_when(
sex == "male" ~ 1,
sex == "female" ~ 2))
table(df$sex, useNA="ifany")
# 1 2 <NA>
# 16935 3271 1