Search code examples
rgtsummary

How to label variable *values* in tbl_summary() tables?


I cannot seem to get my variable value labels to show up in my tbl_summary() table.

I have labeled my variables and variable values using the {labelled} package, as such:

library(dplyr)
library(labelled)
library(gtsummary)

var_label(df$SEX) <- "Sex"
val_label(df$SEX, 1) <- "Male"
val_label(df$SEX, 2) <- "Female"
 
table <- df %>% 
  select(SEX) %>%
  tbl_summary() 
  
table

When I go to make my summary table, the variable label for “SEX” shows up just fine, but the male and female value labels do not show up at all. Instead, the 1 and 2 coding shows up. How do I fix this?

In the documentation I read, it says “label attributes from the data set are automatically printed" and that “gtsummary leverages the labelled package”.

Thanks!


Solution

  • Thank you for the thoughtful post. I need to update the documentation to be more clear: "Variable label attributes from the data set are automatically printed." this does not, in fact, apply the value labels. In the case of the haven_labelled data set (i.e. a data frame with value labels), it was never meant to be a class that was used in analysis or data exploration. Rather, it was created as an in-between when importing data from other languages where the data types don't have a one-to-one relationship with R. This is from a tidyverse blogpost about the haven labelled class of variables. (https://haven.tidyverse.org/articles/semantics.html)

    The goal of haven is not to provide a labelled vector that you can use everywhere in your analysis. The goal is to provide an intermediate data structure that you can convert into a regular R data frame.

    For the time being, I recommend you convert the variables with value labels to factors with as_factor(df) (can be run on the entire data frame) to convert the haven labelled data to factors.

    Utilizing your example above, this is the code I would run:

    library(gtsummary)
    library(tidyverse)
    
    df %>% 
      haven::as_factor() %>%
      select(SEX) %>%
      tbl_summary() 
    

    Specific to the labelled and gtsummary packages, the labelled package author has offered this guidance: https://github.com/ddsjoberg/gtsummary/issues/488#issuecomment-682576441

    Happy Programming!