Search code examples
rtidyverser-haven

set missing values for multiple labelled variables


How to I set missing values for multiple labelled vectors in a data frame. I am working with a survey dataset from spss. I am dealing with about 20 different variables, with the same missing values. So would like to find a way to use lapply() to make this work, but I can't.

I actually can do this with base R via as.numeric() and then recode() but I'm intrigued by the possibilities of haven and the labelled class so I'd like to find a way to do this all in Hadley's tidyverse

Roughly the variables of interest look like this. I am sorry if this is a basic question, but I find the help documentaiton associated with the haven and labelled packages just very unhelpful.

library(haven)
library(labelled)
v1<-labelled(c(1,2,2,2,5,6), c(agree=1, disagree=2, dk=5, refused=6))
v2<-labelled(c(1,2,2,2,5,6), c(agree=1, disagree=2, dk=5, refused=6))
v3<-data.frame(v1=v1, v2=v2)
lapply(v3, val_labels)
lapply(v3, function(x) set_na_values(x, c(5,6)))

Solution

  • The first argument to set_na_values is a data frame, not a vector/column, which is why your lapply command doesn't work. You could build a list of the arguments for set_na_values for an arbitrary number of columns in your data frame and then call it with do.call as below...

    v1<-labelled(c(1,2,2,2,5,6), c(agree=1, disagree=2, dk=5, refused=6))
    v2<-labelled(c(1,2,2,2,5,6), c(agree=1, disagree=2, dk=5, refused=6))
    v3<-data.frame(v1=v1, v2=v2)
    na_values(v3)
    
    args <- c(list(.data = v3), setNames(lapply(names(v3), function(x) c(5,6)), names(v3)))
    v3 <- do.call(set_na_values, args)
    na_values(v3)
    

    Update: You can also use the assignment form of the na_values function within an lapply statement, since it accepts a vector as it's first argument instead of a data frame like set_na_values...

    library(haven)
    library(labelled)
    v1<-labelled(c(1,2,2,2,5,6), c(agree=1, disagree=2, dk=5, refused=6))
    v2<-labelled(c(1,2,2,2,5,6), c(agree=1, disagree=2, dk=5, refused=6))
    v3<-data.frame(v1=v1, v2=v2)
    lapply(v3, val_labels)
    na_values(v3)
    
    v3[] <- lapply(v3, function(x) `na_values<-`(x, c(5,6)))
    na_values(v3)
    

    or even use the normal version of na_values in the lapply command, just making sure to return the 'fixed' vector...

    library(haven)
    library(labelled)
    v1<-labelled(c(1,2,2,2,5,6), c(agree=1, disagree=2, dk=5, refused=6))
    v2<-labelled(c(1,2,2,2,5,6), c(agree=1, disagree=2, dk=5, refused=6))
    v3<-data.frame(v1=v1, v2=v2)
    lapply(v3, val_labels)
    na_values(v3)
    
    v3[] <- lapply(v3, function(x) { na_values(x) <- c(5,6); x } )
    na_values(v3)
    

    and that idea can be used inside of a dplyr chain as well, either applying to all variables, or applying to whatever columns are selected using dplyr's selection tools...

    library(haven)
    library(labelled)
    library(dplyr)
    v1<-labelled(c(1,2,2,2,5,6), c(agree=1, disagree=2, dk=5, refused=6))
    v2<-labelled(c(1,2,2,2,5,6), c(agree=1, disagree=2, dk=5, refused=6))
    v3<-data.frame(v1=v1, v2=v2)
    lapply(v3, val_labels)
    na_values(v3)
    
    v4 <- v3 %>% mutate_all(funs(`na_values<-`(., c(5,6))))
    na_values(v4)
    
    v5 <- v3 %>% mutate_each(funs(`na_values<-`(., c(5,6))), x)
    na_values(v5)