How to I set missing values for multiple labelled vectors in a data frame. I am working with a survey dataset from spss. I am dealing with about 20 different variables, with the same missing values. So would like to find a way to use lapply() to make this work, but I can't.
I actually can do this with base R via as.numeric() and then recode() but I'm intrigued by the possibilities of haven and the labelled class so I'd like to find a way to do this all in Hadley's tidyverse
Roughly the variables of interest look like this. I am sorry if this is a basic question, but I find the help documentaiton associated with the haven and labelled packages just very unhelpful.
library(haven)
library(labelled)
v1<-labelled(c(1,2,2,2,5,6), c(agree=1, disagree=2, dk=5, refused=6))
v2<-labelled(c(1,2,2,2,5,6), c(agree=1, disagree=2, dk=5, refused=6))
v3<-data.frame(v1=v1, v2=v2)
lapply(v3, val_labels)
lapply(v3, function(x) set_na_values(x, c(5,6)))
The first argument to set_na_values
is a data frame, not a vector/column, which is why your lapply
command doesn't work. You could build a list of the arguments for set_na_values
for an arbitrary number of columns in your data frame and then call it with do.call
as below...
v1<-labelled(c(1,2,2,2,5,6), c(agree=1, disagree=2, dk=5, refused=6))
v2<-labelled(c(1,2,2,2,5,6), c(agree=1, disagree=2, dk=5, refused=6))
v3<-data.frame(v1=v1, v2=v2)
na_values(v3)
args <- c(list(.data = v3), setNames(lapply(names(v3), function(x) c(5,6)), names(v3)))
v3 <- do.call(set_na_values, args)
na_values(v3)
Update: You can also use the assignment form of the na_values
function within an lapply
statement, since it accepts a vector as it's first argument instead of a data frame like set_na_values
...
library(haven)
library(labelled)
v1<-labelled(c(1,2,2,2,5,6), c(agree=1, disagree=2, dk=5, refused=6))
v2<-labelled(c(1,2,2,2,5,6), c(agree=1, disagree=2, dk=5, refused=6))
v3<-data.frame(v1=v1, v2=v2)
lapply(v3, val_labels)
na_values(v3)
v3[] <- lapply(v3, function(x) `na_values<-`(x, c(5,6)))
na_values(v3)
or even use the normal version of na_values
in the lapply
command, just making sure to return the 'fixed' vector...
library(haven)
library(labelled)
v1<-labelled(c(1,2,2,2,5,6), c(agree=1, disagree=2, dk=5, refused=6))
v2<-labelled(c(1,2,2,2,5,6), c(agree=1, disagree=2, dk=5, refused=6))
v3<-data.frame(v1=v1, v2=v2)
lapply(v3, val_labels)
na_values(v3)
v3[] <- lapply(v3, function(x) { na_values(x) <- c(5,6); x } )
na_values(v3)
and that idea can be used inside of a dplyr
chain as well, either applying to all variables, or applying to whatever columns are selected using dplyr
's selection tools...
library(haven)
library(labelled)
library(dplyr)
v1<-labelled(c(1,2,2,2,5,6), c(agree=1, disagree=2, dk=5, refused=6))
v2<-labelled(c(1,2,2,2,5,6), c(agree=1, disagree=2, dk=5, refused=6))
v3<-data.frame(v1=v1, v2=v2)
lapply(v3, val_labels)
na_values(v3)
v4 <- v3 %>% mutate_all(funs(`na_values<-`(., c(5,6))))
na_values(v4)
v5 <- v3 %>% mutate_each(funs(`na_values<-`(., c(5,6))), x)
na_values(v5)