Search code examples
rr-haven

Efficient way to conditionally edit value labels


I'm working with survey data containing value labels. The haven package allows one to import data with value label attributes. Sometimes these value labels need to be edited in routine ways.

The example I'm giving here is very simple, but I'm looking for a solution that can be applied to similar problems across large data.frames.

d <- dput(structure(list(var1 = structure(c(1, 2, NA, NA, 3, NA, 1, 1), labels = structure(c(1, 
2, 3, 8, 9), .Names = c("Protection of environment should be given priority", 
"Economic growth should be given priority", "[DON'T READ] Both equally", 
"[DON'T READ] Don't Know", "[DON'T READ] Refused")), class = "labelled")), .Names = "var1", row.names = c(NA, 
-8L), class = c("tbl_df", "tbl", "data.frame")))

d$var1
<Labelled double>
[1]  1  2 NA NA  3 NA  1  1

Labels:
 value                                              label
     1 Protection of environment should be given priority
     2           Economic growth should be given priority
     3                          [DON'T READ] Both equally
     8                            [DON'T READ] Don't Know
     9                               [DON'T READ] Refused

If a value label begins with "[DON'T READ]" I want to remove "[DON'T READ]" from the beginning of the label and add "(VOL)" at the end. So, "[DON'T READ] Both equally" would now read "Both equally (VOL)."

Of course, it's straightforward to edit this individual variable with a function from haven's associated labelled package. But I want to apply this solution across all the variables in a data.frame.

library(labelled)
val_labels(d$var1) <- c("Protection of environment should be given priority" = 1,
                           "Economic growth should be given priority" = 2,
                           "Both equally (VOL)" = 3,
                           "Don't Know (VOL)" = 8,
                           "Refused (VOL)" = 9)

How can I achieve the result of the function directly above in a way that can be applied to every variable in a data.frame?

The solution must work regardless of the specific value. (In this instance it is values 3,8, & 9 that need alteration, but this is not necessarily the case).


Solution

  • There are a few ways to do this. You could use lapply() or (if you want a one(ish)-liner) you could use any of the scoped variants of mutate():

    1). Using lapply()

    This method loops over all columns with gsub() to remove the part you do not want and adds the " (VOL)" to the end of the string. Of course you could use this with a subset as well!

    d[] <- lapply(d, function(x) {
    
      labels <- attributes(x)$labels
      names(labels) <- gsub("\\[DON'T READ\\]\\s*(.*)", "\\1 (VOL)", names(labels))
      attributes(x)$labels <- labels
      x
    
    })
    
    d$var1
    [1]  1  2 NA NA  3 NA  1  1
    attr(,"labels")
    Protection of environment should be given priority           Economic growth should be given priority 
                                                     1                                                  2 
                                    Both equally (VOL)                                   Don't Know (VOL) 
                                                     3                                                  8 
                                         Refused (VOL) 
                                                     9 
    attr(,"class")
    [1] "labelled"
    

    2) Using mutate_all()

    Using the same logic (with the same result) you could change the name of the labels in a tidier way:

    d %>%
      mutate_all(~{names(attributes(.)$labels) <- gsub("\\[DON'T READ\\]\\s*(.*)", "\\1 (VOL)", names(attributes(.)$labels));.}) %>%
      map(attributes) # just to check on the result