Search code examples
rmergelabel

How do I keep the labels when merging data with left_join()?


I have two dataframes with manually added labels. I want to merge these two dataframes while preserving labels from both dataframes. This post suggets that left_join() does precisely that.

I tried the following approach:

library(dplyr)

# Create first sample dataframe
df1 = data.frame(matrix(1:10, nrow=5))
attributes(df1)$variable.labels[1] <- "Reference Variable"
attributes(df1)$variable.labels[2] <- "Label 1"

# Create second sample dataframe
df2 = data.frame(matrix(1:30, nrow=10))
names(df2)[2] <- "Y2"
names(df2)[3] <- "Y3"
attributes(df2)$variable.labels[2] <- "Label 2"
attributes(df2)$variable.labels[3] <- "Label 3"

# Merge both dataframes
merged_data <- left_join(x = df1, y = df2, by = "X1")

# Labels of df1 still exist while the ones from df2 don't
attributes(merged_data)$variable.labels[2]
attributes(merged_data)$variable.labels[3]
attributes(merged_data)$variable.labels[4]

In the merged dataframe the labels for Y2 and Y3 are missing. The desired outcome is to have the dataframe merged_data with all the labels from df1 and df2.

Is there a way to achieve that?


Solution

  • One solution would be to reattach the attributes using attr(). For instance:

    attr(merged_data, "variable.labels") <- c(
      attr(df1, "variable.labels"),
      attr(df2, "variable.labels")
    )
    

    Doing it inline a pipe could be done like this:

    merged_data <-
      left_join(x = df1, y = df2, by = "X1") %>%
      `attr<-`(
        "variable.labels",
        c(attr(df1, "variable.labels"),
          attr(df2, "variable.labels"))
      )