Search code examples
rlabelstatadata-management

Save variable labels in R so that Stata can read them


I have a dataset with over 170 variables that looks as follows:

df <- data.frame(var1 = 1:3, var2 = 2:4, var3 = 2:4, var4 = 2:4, var5 = 2:4)

I have manually added variable values using Hmisc:

library(Hmisc)
var.labels = c(var1 = "label 1",
               var3 = "label 2",
               var4="label 4" )
label(df) = as.list(var.labels[match(names(df), names(var.labels))])              

Do note that not all variables have labels and that it is much easier for me to specify variable labels by name (var*), rather than position given my large number of colums.

The problem is that when I save my dataset with write.csv(df,"df.csv")or with write.dta(df,"df.dta") my variable labels get lost.

How can I save the data in a way that can be re-imported into R and Stata with the labels restored?


Solution

  • It is a little tricky. You need to label all variables. If you do not want to label a variable, you may still need to put variable = "". Otherwise write.dta will ignore all variable labels.

    In R

    df <- data.frame(var1 = 1:3, var2 = 2:4, var3 = 2:4, var4 = 2:4, var5 = 2:4)
    attr(df, "var.labels") <- c(var1 = "label 1", var2 = "label 2", 
                  var3 = "", var4="label 4", var5 = "")
    foreign::write.dta(df, "dat_stata.dta")
    
    

    In Stata, you get:

    . des
    
    Contains data from C:...dat_stata.dta
      obs:             3                          Written by R.              
     vars:             5                          
     size:            60                          
    ------------------------------------------------------------------------------------------------------
                  storage   display    value
    variable name   type    format     label      variable label
    ------------------------------------------------------------------------------------------------------
    var1            long    %9.0g                 label 1
    var2            long    %9.0g                 label 2
    var3            long    %9.0g                 
    var4            long    %9.0g                 label 4
    var5            long    %9.0g                 
    ------------------------------------------------------------------------------------------------------
    Sorted by: 
    

    Please note: I used Stata 14, and R package foreign