Search code examples
rtidyversepivot-tablegtsummary

how to use both #gtsummary and #tidyverse for cross-tabulation showing chi-square and t-test results in R


I want to cross tabulate member and author in the rows and review, publish and pay in the column showing row and column total with percentages in bracket and chi-square test in the footnote.

#data
set.seed(123)
member <- sample(c("Yes", "No"), 100, replace = TRUE)
author <- sample(c("Yes", "No"), 100, replace = TRUE)
review <- sample(0:10, 100, replace = TRUE)
publish <- sample(0:10, 100, replace = TRUE)
pay <- sample(0:10, 100, replace = TRUE)
data <- data.frame(member, author, review, publish, pay)

But I recently found out about gtsummary which will produce the result I want but I'm struggling to replicate the result - so far with the tidy code I have this: I want review, publish and pay to be grouped by No (0-4), Maybe (5) and Yes (6-10) as shown in the code below. So far I have used tidyverse:

data |>
  group_by(member)|>
  summarise(
    Disagree = sum(review<5),
    Neutral = sum(review==5),
    Agree = sum(review>5))|>
  kbl(caption = "Review by member") %>%
  kable_paper("hover",full_width = F,html_font = "Cambria")
fisher.test(table(data$member, data$review),simulate.p.value = T)

Thanks for your help. I could not post the image because I need 10 reputation (I don't know what that means)

The preferred output is have review, publish and pay has three columns with groups No, Maybe, Yes.


Solution

  • Update: We could add use tbl_split(., c(author, review_group, publish_group, pay_group)) to the code:

    Here you will get 4 separate tables that you could put side by side:

    library(dplyr)
    library(gtsummary)
    
    data %>%
      mutate(across(c(review, publish, pay), ~cut(., breaks = c(-Inf, 4.5, 5.5, Inf),
                                                  labels = c("No", "Maybe", "Yes"),
                                                  include.lowest = TRUE), .names = "{.col}_group")) %>% 
      select(member, author, ends_with("group")) %>%
      tbl_summary(
        by = member,
        missing = "no", 
        statistic = list(all_categorical() ~ "{n} ({p}%)"),
        digits = list(all_categorical() ~ c(0, 1))
      ) %>%
      add_p(test = all_categorical() ~ "chisq.test") %>% 
      tbl_split(., c(author, review_group, publish_group, pay_group))
    

    First answer: We could do it this way:

    library(dplyr)
    library(gtsummary)
    
    data %>%
      mutate(across(c(review, publish, pay), ~cut(., breaks = c(-Inf, 4.5, 5.5, Inf),
                                                  labels = c("No", "Maybe", "Yes"),
                                                  include.lowest = TRUE), .names = "{.col}_group")) %>% 
      select(member, author, ends_with("group")) %>%
      tbl_summary(
        by = member,
        missing = "no", 
        statistic = list(all_categorical() ~ "{n} ({p}%)"),
        digits = list(all_categorical() ~ c(0, 1))
      ) %>%
      add_p(test = all_categorical() ~ "chisq.test")
    

    enter image description here