Search code examples
rr-haven

Combine two excel data sets with labels and values to one labeled SPSS


I have two excel files that originally came from an SPSS dataset. One has values and the other one holds the labels. Is there an easy way to combine both sources and save it as a labeled SPSS dataset? The input looks like that:

ds1 <- data.frame(ID=1:12, COUNTRY=rep(1:6, each =2), REGION=rep(1:2, each =6))
ds2 <- data.frame(ID=1:12, COUNTRY_lab=rep(c("US","CA","MX","DE","FR","IT"),each=2), REGION_lab=rep(c("NA","EU"), each =6))

And this is the structure what I want as outcome:

library(haven)
ds3 <- labelled(ds1$COUNTRY, c(US = 1, CA = 2, MX =3, DE = 4, FR = 5, IT = 6))
  1. How can I do the matching automatically?
  2. How can I do this for an entire dataset (not only one column like in my example)

Solution

  • You can create a named vector after merging the two datasets.

    library(haven)
    ds3 <- merge(ds1, ds2)
    ds4 <- labelled(ds3$COUNTRY, setNames(unique(ds3$COUNTRY), unique(ds3$COUNTRY_lab)))
    ds4
    
    #<labelled<integer>[12]>
    # [1] 1 1 2 2 3 3 4 4 5 5 6 6
    
    #Labels:
    # value label
    #     1    US
    #     2    CA
    #     3    MX
    #     4    DE
    #     5    FR
    #     6    IT