Search code examples
rnon-standard-evaluation

Assigning labels to a data frame from bank of possible labels


I would like to create a function that updates a data frame from a different environment. Specifically, I would like to update the labels of a data frame using the Hmisc::label() function.

assign_label <- function(df, col) {
  col <- rlang::as_name(rlang::ensym(col))
  Hmisc::label(df[,col]) <- fetch_label(col)
}

fetch_label <- function(col) {
  val <- c("mpg" = "MPG",
           "hp" = "HP") 
  unname(val[col])
}

The following code executes without issue: assign_label(mtcars, hp)

However, it does not actually alter the data frame in the calling environment. I just can't figure out how to make it do what I imagine.

Ideally, I would like to be able to pipe a dataframe to this function as such:

mtcars %>% assign_label(mpg)


Solution

  • 1) Return modified object Modifying objects in place is discouraged in R. The usual way to do this is to return the data frame and then assign it to a new name or back to the original name clobbering or shadowing it.

    assign_label <- function(df, col) {
      col <- deparse(substitute(col))
      Hmisc::label(df[[col]]) <- fetch_label(col)
      df
    }
    
    mtcars_labelled <- mtcars %>% assign_label(mpg)
    

    2) magrittr Despite what we have said above there are some facilties for modifying in place in R and in some R packages. The magrittr package provides a syntax for overwriting or shadowing the input. Using the definition in (1) we can write:

    library(mtcars)
    mtcars %<>% assign_label(mpg)
    

    If mtcars were in the global environment it would ovewrite it with the new value but in this case mtcars is in datasets so a new mtcars is written to the caller and the original in datasets is unchanged.

    3) replacement function Although not widely used, R does provide replacement functions which are defined and used like this. This does overwite or shadow the input.

    `assign_label<-` <- function(df, value) {
      Hmisc::label(df[[value]]) <- fetch_label(value)
      df
    }
    
    assign_label(mtcars) <- "mpg"
    

    Note

    As an aside, if the aim is for an interface that is consistent with tidyverse then use tidyselect to retrieve the column name(s) so that examples like the following work:

    assign_labels <- function(df, col) {
      nms <- names(select(df, {{col}}))
      for(nm in nms) Hmisc::label(df[[nm]]) <- fetch_label(nm)
      df
    }
    
    mtcars_labelled <- mtcars %>% assign_labels(starts_with("mp"))
    str(mtcars_labelled)
    
    mtcars_labelled <- mtcars %>% assign_labels(mpg|hp)
    str(mtcars_labelled)