I have a dataset with over 170
variables that looks as follows:
df <- data.frame(var1 = 1:3, var2 = 2:4, var3 = 2:4, var4 = 2:4, var5 = 2:4)
I have manually added variable values using Hmisc
:
library(Hmisc)
var.labels = c(var1 = "label 1",
var3 = "label 2",
var4="label 4" )
label(df) = as.list(var.labels[match(names(df), names(var.labels))])
Do note that not all variables have labels and that it is much easier for me to specify variable labels by name (var*
), rather than position given my large number of colums.
The problem is that when I save my dataset with write.csv(df,"df.csv")
or with write.dta(df,"df.dta")
my variable labels get lost.
How can I save the data in a way that can be re-imported into R and Stata with the labels restored?
It is a little tricky. You need to label all variables. If you do not want to label a variable, you may still need to put variable = ""
. Otherwise write.dta
will ignore all variable labels.
In R
df <- data.frame(var1 = 1:3, var2 = 2:4, var3 = 2:4, var4 = 2:4, var5 = 2:4)
attr(df, "var.labels") <- c(var1 = "label 1", var2 = "label 2",
var3 = "", var4="label 4", var5 = "")
foreign::write.dta(df, "dat_stata.dta")
In Stata
, you get:
. des
Contains data from C:...dat_stata.dta
obs: 3 Written by R.
vars: 5
size: 60
------------------------------------------------------------------------------------------------------
storage display value
variable name type format label variable label
------------------------------------------------------------------------------------------------------
var1 long %9.0g label 1
var2 long %9.0g label 2
var3 long %9.0g
var4 long %9.0g label 4
var5 long %9.0g
------------------------------------------------------------------------------------------------------
Sorted by:
Please note: I used Stata 14
, and R
package foreign