Search code examples
rdplyrdata-cleaninghaven

Relabelling haven labelled data at scale


Let's say I have the following tibble made by the tibble and haven packages:

library(tibble)
library(haven)

# Create numerical values
values <- c(1:5)

# Combine values and colors into a named vector
color_choices <- setNames(values, c("Don't know", "Red", "Blue", "Green", "Yellow"))
name_choices <- setNames(values, c("Don't know", "John", "Paul", "Ringo", "George"))

# Create a tibble with the labelled column
data <- tibble(respondent_ID = seq(1:10),
               colour_choice = labelled(sample(1:5, 10, replace = TRUE), labels = color_choices),
               name_choice  = labelled(sample(1:5, 10, replace = TRUE), labels = name_choices))

data

Now I want to change the haven labels of some of the variables. Specifically I only want to change the the label for value = 1 from "Don't know" to "Not sure", but I want to do this across multiple variables.

This achieves the result I want for the colour_choice variable:

data_replaced <- data
color_choices2 <- color_choices

names(color_choices2)[1] <- "Not sure"
val_labels(data_replaced$colour_choice) <- color_choices2

However, this is tedious for two reasons; first, even for one variable it is inefficient as it involves making a new named vector for all of the labels in the variable when there is only one that needs replacing. [val_labels(data_replaced$colour_choice) <- c("Not sure" = 1) results in the other labels being removed]. And, second, it is absolutely not scalable.

I have been experimenting with a dplyr approach (which would be preferable anyway) using the memisc::relabel function but keep hitting walls and was wondering if anyone can make a suggestion? Here is where I am at:

data_replaced <- data %>%
  mutate_at(vars(colour_choice, name_choice), ~ memisc::relabel(., "Don't know" = "Not sure"))

Solution

  • You can make a helper function using labelled:val_label<-, then call it in mutate(across()):

    library(dplyr)
    library(labelled)
    set.seed(13) # for OP's sample data
    
    change_value_label <- function(x, value, new_label) {
      val_label(x, value) <- new_label
      x
    }
    
    data %>% 
      mutate(across(
        colour_choice:name_choice, 
        \(x) change_value_label(x, 1, "Not sure")
      ))
    
    # # A tibble: 10 × 3
    #    respondent_ID colour_choice name_choice 
    #            <int> <int+lbl>     <int+lbl>   
    #  1             1 3 [Blue]      5 [George]  
    #  2             2 5 [Yellow]    4 [Ringo]   
    #  3             3 2 [Red]       1 [Not sure]
    #  4             4 5 [Yellow]    5 [George]  
    #  5             5 4 [Green]     1 [Not sure]
    #  6             6 5 [Yellow]    4 [Ringo]   
    #  7             7 4 [Green]     1 [Not sure]
    #  8             8 3 [Blue]      3 [Paul]    
    #  9             9 1 [Not sure]  4 [Ringo]   
    # 10            10 2 [Red]       5 [George]  
    

    You could tweak the function to take the label to be changed rather than the value (e.g, swap_value_label(x, "Don't know", "Not sure")). This has the added benefit of working even if the label to be changed doesn't have a consistent value across different variables.

    swap_value_label <- function(x, old_label, new_label) {
      value <- val_labels(x)[[old_label]]
      val_label(x, value) <- new_label
      x
    }