Search code examples
rsummarysummarizegtsummary

R: Create summary table using T1 data for all groups, and T2 data for one group only


I am trying to create a summary table in R of baseline (T1) scores for all participants, grouped by another column with three variables (Group1, Group2, Group3), as well as outcome scores for Group3 only, as Group1 and Group2 do not have this data. I would like the table to look something like this (I have T1 and T2 there as headers above headers but I can't figure out how to do this here):

                                                     T1                                        T2
Group 1 Group 2 Group 3 Group 1
measure1 Score Score Score Score
measure2 Score Score Score Score
measure3 Score Score Score Score
measure4 Score Score Score Score

My data are currently in wide format but I've also transformed it into long format to see if it would be achieveable this way but no success yet with any method I've chosen.

My variables in wide format would be = group, measure1_t1, measure2_t1, measure_3t1, measure4_t1, measure1_t2, measure2_t2, measure3_t2, measure4_t2.

In long format, these would be group, time, measure1, measure2, measure3, measure4

Would anyone have any advice on how I could achieve this? I can't seem to get it without including columns for group2 and group3 for the measures for T2. So far, I've tried using gt_summary and dplyr::summarise but with no success, but I'm open to using other packages/functions.

Alternatively, if there's a way to combine two tables to achieve this instead of doing one table only I'm happy to explore that option

Thanks


Solution

  • I was able to achieve this through creating two separate tables using table_summary and using table_merge to bring them together.

    First table:

    # Filtering dataset to include only T1 data, and selecting relevant variables.
    t1_prep <- df %>%
      dplyr::filter(time == "t1") %>%
      dplyr::select(group, measure1, measure2, measure3, measure4)
    
    #Creating the summary table
    t1_sum <- t1_prep %>%
      tbl_summary(by = group,
                  statistic = list(all_continuous() ~ "{mean} ({sd})"),
                  missing_text = "Missing"
                  ) %>%
      add_overall() %>%
      add_n() %>%
      modify_header(label ~ "**Variable**") %>%
      modify_footnote(
        all_stat_cols() ~ "Mean (SD)"
      )
    

    Second table

    # Filtering dataset, this time to specify at T2, and group 1 only. In select, no longer choose the group variable as this is no longer wanted. 
    t2_prep <- df %>%
      dplyr::filter(time == "t2", group == "Group1") %>%
      dplyr::select(measure1, measure2, measure3, measure4)
    
    t2_sum <- t2_prep %>%
      tbl_summary(by = NULL,
                  statistic = list(all_continuous() ~ "{mean} ({sd})"),
                  missing_text = "Missing") %>%
     add_n() %>%
    #Note that must specify the group and number included in that group below for consistency when merging the tables - this would only state N = 65 otherwise
      modify_header(label ~ "**Variable**", all_stat_cols()  ~ "**Completer** N = 65") %>%
      modify_footnote(
        all_stat_cols() ~ "Mean (SD)"
      )
    

    Merging

    t1_t2_merge <-
      tbl_merge(
        list(t1_sum, t2_sum),
        tab_spanner = c("**Pre-Intervention**", "**Post-Intervention**")
      )  %>%
       as_gt() %>% #this and belowis not necessary but I wanted to add styling
      gt::tab_options(table.font.names = "Times New Roman") %>%
      gt::opt_row_striping() 
    t1_t2_merge