Search code examples
rcorrelationchi-squaredhypothesis-test

Chi square tests for multiple columns in R


Here,I made data as follows:

data<-data.frame(alzheimer=c(1,1,0,1,0,0,1,0,0,0),
                 asthma=c(1,1,0,0,1,1,1,1,0,0),
                 points=c(0,1,3,5,3,2,1,2,1,5),
                 sex=c(1,1,0,0,0,0,1,1,1,0))

I want to know whether sex affects alzheimer or asthma or points. So I was considering to do chi-square test for independence. alzheimer and asthma are binary variables, so I think I can add all the numbers from sex==1 and sex==0 separately and make contingency tables to do chi-square tests. For the variable points, I don't know whether I can do chi-square test, because points is an ordinal variable ranges from 0 to 5 with only integers.

To sum up, I want to do 3 tests.

  1. Are sex and alzheimer independent ?
  2. Are sex and asthma independent?
  3. Are sex and points independnet?

Additionally, in my actual data there are so many columns, so I need to know how to do many tests all in once and make it into a csv file. The csv file should include test statistics and p-values.


Solution

  • We could write a function stat_test which applies a chisq.test on binary columns and a wilcox.test on the other columns (assuming they are all ordinal). We can make this function output three things.

    1. the name of the test
    2. the value of the statistics (stats)
    3. the p value

    Then we could use dplyr::across() to apply this test to all columns (expect the alzheimer column which is used as y input in our function). Afterwards we just add the labels as first row.

    data <- data.frame(alzheimer=c(1,1,0,1,0,0,1,0,0,0),
                       asthma=c(1,1,0,0,1,1,1,1,0,0),
                       points=c(0,1,3,5,3,2,1,2,1,5),
                       sex=c(1,1,0,0,0,0,1,1,1,0))
    
    library(dplyr)
    
    stat_test <- function(x, y) {
      if (length(unique(na.omit(x))) > 2) {
        res <- chisq.test(x = x,
                   y = y)
        label <- "chi_square"
      } else {
        res <- wilcox.test(x, y = y)
        label <- "wilcox"
      }
      
      c(
        test = label,
        stats = res$statistic,
        p_val = res$p.value
      )
    }
    
    data %>% 
      as_tibble %>% 
      summarise(across(-alzheimer,
                       ~ stat_test(.x, alzheimer))) %>% 
      mutate(label = c("test", "stats", "pvalue"), .before = 1L)
    #> Warning in wilcox.test.default(x, y = y): cannot compute exact p-value with ties
    #> Warning in chisq.test(x = x, y = y): Chi-squared approximation may be incorrect
    #> Warning in wilcox.test.default(x, y = y): cannot compute exact p-value with ties
    #> # A tibble: 3 x 4
    #>   label  asthma            points            sex              
    #>   <chr>  <chr>             <chr>             <chr>            
    #> 1 test   wilcox            chi_square        wilcox           
    #> 2 stats  60                5.13888888888889  55               
    #> 3 pvalue 0.407562453620744 0.273341191458911 0.693376361757653
    

    Created on 2022-09-27 by the reprex package (v2.0.1)