Search code examples
rgtsummary

Two types of missing values for gtsummary::tbl_summary


I have a dataset with a continuous variable that I want to run through tbl_summary. Within it, there are two possible character identifiers. A simple example below:

> df <- data.frame(type = c("A", "B", "A", "A", "B"), 
                   value = c("1", "2", "3", "NA", "Never Entered"))
> df
  type         value
1    A             1
2    B             2
3    A             3
4    A            NA
5    B Never Entered

These are the only two non-numeric values this variable will ever have. When converting value with as.numeric the non-numeric values are converted to NA. In tbl_summary you have the option to specify missing_text, but I want to specify when something is "NA" versus "Never Entered".

df |> 
  mutate(
    value = as.numeric(value)
  ) |> 
  tbl_summary(
    by = "type",
    type = list(everything() ~ "continuous"),
    statistic = everything() ~ "{median} ({min}-{max})",
    digits = everything() ~ 0,
    missing = "always",
    missing_text = "Never Entered"
  )

example tbl_summary output

Ideally the output above should show "Never Entered" as "0" for column A, and "1" for column B. Thoughts and input appreciated.


Solution

  • Since I don't believe that missing_text can be used the way you want it to be used, I think the next best option would be to make two binary variable that indicate whether value is NA or "Never Entered". I used ifelse to create these variables conditionally and assigned them a blank space if yes, they were NA or Never Entered. Then I added those two binary variables to tbl_summary.

    library(tidyverse)
    library(gtsummary)
    
    df <- data.frame(type = c("A", "B", "A", "A", "B"), 
                     value = c("1", "2", "3", "NA", "Never Entered")) %>%
      mutate(isitNA= ifelse(type == "NA", 1, ifelse(value == "NA", " ", 0))) %>%
      mutate(isitNeverEntered = ifelse(type == "Never Entered", " ", ifelse(value == "Never Entered", " ", 0))) %>%
      mutate(value = as.numeric(value))
    
    df %>%
      tbl_summary(
      by = type, 
      missing = "no",
      label = list(value ~ "Value", 
                   isitNA ~ "NA",
                   isitNeverEntered ~ "Never Entered"),
      type = list(value ~ 'continuous',
                  isitNA ~ 'categorical',
                  isitNeverEntered ~ 'categorical'),
      statistic = all_continuous() ~ c("{median} ({min}-{max})")
    ) %>% 
      modify_table_body(filter, !(variable == "isitNeverEntered" & label == "0")) %>%
      modify_table_body(filter, !(variable == "isitNA" & label == "0"))
    

    enter image description here