Search code examples
rgtsummary

Different tests for different variables of the same data type in gtsummary


I have a dataframe and hope to present p value of baseline table using Gtsummary, with different tests for different variables of the same data type. (eg. Use fisher exact test for some categorical variables and chi square tests for others.)

For example,

 # create example data
set.seed(123)
mydata <- data.frame(a = sample(c("Yes", "No"), 100, replace = TRUE),
                 b = sample(c("Yes", "No"), 100, replace = TRUE),
                 c = sample(c("Yes", "No"), 100, replace = TRUE),
                 d = sample(c("Low", "Medium", "High"), 100, replace = TRUE),
                 e = sample(c("Group 1", "Group 2", "Group 3"), 100, replace = TRUE),
                 f = sample(c("Male", "Female"), 100, replace = TRUE),
                 g = rnorm(100),
                 h = rnorm(100))`

I hope to that b c can be tested by fisher.test and d e f can be tested by chisq.test. (Divided by a )

I tried:

mydata %>%  
tbl_summary(     
by = a,                                            
statistic = list(all_continuous() ~ "{median} ({p25}-{p75}",        
                 all_categorical() ~ "{n} / {N} ({p}%)")
)   %>% add_p(all_continuous() ~ 't.test',
             all_categorical(-c('b','c')) ~ "chisq.test",
             c('b', 'c') ~ "fisher.test", pvalue_fun = function(x) style_number(x, digits =                 3))`

This does not work. I guess there is some thing wrong with the all categorical(-c(‘b’, ‘c’)), but is there a way to rapidly removing certain variables from “all categorical()”?

A more advanced question is, how can I let the function detect which is the best test to use? I found that add_p will not automatically use fisher’s exact test when the data do not meet the standard of chi square.

Thank you all for the kind help!


Solution

  • You need to specifically call the test argument in add_p(). Also I am not sure about this syntax all_categorical(-c('b','c')) so I changed it below.

        mydata %>%  
      tbl_summary(     
        by = a,                                            
        statistic = list(all_continuous() ~ "{median} ({p25}-{p75}",        
                         all_categorical() ~ "{n} / {N} ({p}%)")
      )   %>% add_p(test = list(c("g","h") ~ 't.test',
                                c("d", "e", "f") ~ "chisq.test",
                                c('b', 'c') ~ "fisher.test", pvalue_fun = function(x) style_number(x, digits =3))
      )
    

    Sorry when I used the -c("b","c") for the chi square test it also included the continuous variables in that group. Above I explicitly mentioned each variable to include for each test. If you need it to be more dynamic I can amend the code.

    edit: First you can identify all categorical variables and save it as colchar and then in a second step remove the variables from that list that are your by variable and two variables you want to use a fisher test for. Then pass colchar_chisq as your list of variables to the add_p() test argument

    colchar <- colnames(mydata)[sapply(mydata, is.character)]
    
    colchar_chisq <- setdiff(colchar, c("a","b","c"))
    
    mydata %>%  
      tbl_summary(     
        by = a,                                            
        statistic = list(all_continuous() ~ "{median} ({p25}-{p75}",        
                         all_categorical() ~ "{n} / {N} ({p}%)")
      )   %>% add_p(test = list(all_continuous() ~ 't.test',
                                all_of(colchar_chisq) ~ "chisq.test",
                                c('b', 'c') ~ "fisher.test", pvalue_fun = function(x) style_number(x, digits =3))
      )