Search code examples
rstatar-haven

Convert variable label for labeled numeric variable to a new character variable


When importing a dataset from Stata to R, it often comes with helpful labels for numeric variables. I would like to be able to convert the data in the labels to a new separate variable. The equivalent command in Stata is decode.

library(tidyverse)
library(webuse)
auto <- webuse("auto")
auto$foreign #Want to convert this to a character variable that reads "Domestic" or "Foreign"

Solution

  • One option is to use the labelled package, e.g.

    library(tidyverse)
    #install.packages("webuse")
    library(webuse)
    #install.packages("labelled")
    library(labelled)
    
    auto <- webuse("auto")
    auto$foreign
    auto$labels <- labelled::to_factor(auto$foreign, levels = "labels")
    auto$labels
    #>[1] Domestic Domestic Domestic Domestic Domestic Domestic Domestic Domestic Domestic Domestic Domestic Domestic
    #>[13] Domestic Domestic Domestic Domestic Domestic Domestic Domestic Domestic Domestic Domestic Domestic Domestic
    #>[25] Domestic Domestic Domestic Domestic Domestic Domestic Domestic Domestic Domestic Domestic Domestic Domestic
    #>[37] Domestic Domestic Domestic Domestic Domestic Domestic Domestic Domestic Domestic Domestic Domestic Domestic
    #>[49] Domestic Domestic Domestic Domestic Foreign  Foreign  Foreign  Foreign  Foreign  Foreign  Foreign  Foreign 
    #>[61] Foreign  Foreign  Foreign  Foreign  Foreign  Foreign  Foreign  Foreign  Foreign  Foreign  Foreign  Foreign 
    #>[73] Foreign  Foreign 
    #>attr(,"label")
    #>[1] Car type
    #>Levels: Domestic Foreign
    
    

    Or, to keep the values as well as the labels:

    auto$labels <- labelled::to_factor(auto$foreign, levels = "prefixed")
    auto$labels
    #>[1] [0] Domestic [0] Domestic [0] Domestic [0] Domestic [0] Domestic [0] Domestic [0] Domestic [0] Domestic
    #>[9] [0] Domestic [0] Domestic [0] Domestic [0] Domestic [0] Domestic [0] Domestic [0] Domestic [0] Domestic
    #>[17] [0] Domestic [0] Domestic [0] Domestic [0] Domestic [0] Domestic [0] Domestic [0] Domestic [0] Domestic
    #>[25] [0] Domestic [0] Domestic [0] Domestic [0] Domestic [0] Domestic [0] Domestic [0] Domestic [0] Domestic
    #>[33] [0] Domestic [0] Domestic [0] Domestic [0] Domestic [0] Domestic [0] Domestic [0] Domestic [0] Domestic
    #>[41] [0] Domestic [0] Domestic [0] Domestic [0] Domestic [0] Domestic [0] Domestic [0] Domestic [0] Domestic
    #>[49] [0] Domestic [0] Domestic [0] Domestic [0] Domestic [1] Foreign  [1] Foreign  [1] Foreign  [1] Foreign 
    #>[57] [1] Foreign  [1] Foreign  [1] Foreign  [1] Foreign  [1] Foreign  [1] Foreign  [1] Foreign  [1] Foreign 
    #>[65] [1] Foreign  [1] Foreign  [1] Foreign  [1] Foreign  [1] Foreign  [1] Foreign  [1] Foreign  [1] Foreign 
    #>[73] [1] Foreign  [1] Foreign 
    #>attr(,"label")
    #>[1] Car type
    #>Levels: [0] Domestic [1] Foreign
    

    Edit

    To use dplyr mutate:

    library(tidyverse)
    #install.packages("webuse")
    library(webuse)
    #install.packages("labelled")
    library(labelled)
    
    auto <- webuse("auto")
    auto %>% 
      mutate(labels = labelled::to_factor(auto$foreign, levels = "labels")) %>% 
      select(labels)