Search code examples
rtidyversegtsummary

How to modify N in gtsummary/ how to shape one-hot-encoded data for gtsummary


I have survey data which is one_hot_encoded, which I have then shaped into longer data so that I can compare variables within groups. The problem here is that this has created a "magical" increase in my n. I have retained an id column in my dataframe so I can easily obtain the real n using uniq(id) to find the number of different people who provided data.

However, the N given in the table is based on the number of rows. Is there a way to change the function so that tbl_summary() gives N based on the uniq ids? However I have been dropping the id column before calling tbl_summary to avoid getting summary statistics.

The other questions ive been wondering is that perhaps there is a better way to shape my data for it to pair with gtsummary?


drug1_dose = rnorm(100)
drug2_dose = rnorm(100)

df <- data.frame(drug1_dose, drug2_dose) %>%
  rowid_to_column(d, "id") %>%

df <- df %>%
 rename(drug1 = drug1_dose) %>%
  rename(drug2 = drug2_dose) %>%  
  pivot_longer(c(drug1, drug2), names_to = "drug", values_to = "dose", values_drop_na = TRUE) %>%
  select(-id) %>%
  tbl_summary()


It is worth mentioning that in my data, there are several cases where there is only data for drug1 or for drug 2, as the two groups are overlapping but not the same. I was not sure how to show this in reprex.

Thank you in advance!


Solution

  • You can use the modify_headeR() function to change the header to whatever you'd like. Details at http://www.danieldsjoberg.com/gtsummary/reference/modify.html

    library(gtsummary)
    library(tidyverse)
    packageVersion("gtsummary")
    #> [1] '1.4.0'
    
    drug1_dose = rnorm(100)
    drug2_dose = rnorm(100)
    
    df <- 
      data.frame(drug1_dose, drug2_dose) %>%
      rowid_to_column("id") %>%
      rename(drug1 = drug1_dose) %>%
      rename(drug2 = drug2_dose) %>%  
      pivot_longer(c(drug1, drug2), names_to = "drug", values_to = "dose", values_drop_na = TRUE)
    
    tbl <- 
      df %>%
      select(-id) %>%
      tbl_summary() %>%
      modify_header(stat_0 = "**N = 100**")
    

    Created on 2021-04-21 by the reprex package (v2.0.0)