Search code examples
rdataframestatisticstibble

Fisher's exact test in R from dataframe


I have input data (df) for making 2*2 contingency table for each row.

df <- data.frame(as = c("A", "B", "C", "D"), sum_m = c(47, 8, 93, 73), 
           length_m = c(150, 150, 150, 150), sum_w = c(66, 183, 44, 113), length_w = c(199, 199, 199, 199), 
           pooled_p = c(0.32378223495702, 0.547277936962751, 0.392550143266476, 0.532951289398281), 
           test1 = c(TRUE, TRUE, TRUE, TRUE), test2 = c(TRUE, TRUE, TRUE, TRUE), test3 = c(TRUE, TRUE, TRUE, TRUE), 
           test4 = c(TRUE, TRUE, TRUE, TRUE), final_test = c(TRUE, TRUE, TRUE, TRUE))

I wrote a small script (given below) for calculating p value for a single row:

# Chi-square or Fisher's exact test
x    <- c(sum_m, sum_w)
n    <- c(length_m, length_w)
mash <- rbind(c(sum_m, length_m - sum_m),
              c(sum_w, length_w - sum_w))


if(final_test == TRUE){
  
  ## With Yate's continuity correction
  
  prop.test(x,n)
  #Exactly the same as:
  chisq.test(mash)
  
}else{
  
  # Fisher's exact test
  fisher.test(mash)
  
}

hopefully, this makes sense to you.

Suggestions on how to apply this to a large number of rows would be greatly appreciated! If possible then please paste the p-value at the last column.

Thanks in advance :X)


Solution

  • We could wrap the code into a function and then use rowwise and apply the function

    library(dplyr)
    library(tidyr)
    df %>%
       rowwise %>% 
       mutate(out = list(f1(sum_m, sum_w, length_m, length_w, final_test) %>% 
            broom::tidy(.)))  %>%
       ungroup %>%
       unnest(out)
    

    -output

    # A tibble: 4 × 15
      as    sum_m length_m sum_w length_w pooled_p test1 test2 test3 test4 final_test statistic  p.value parameter method                          
      <chr> <dbl>    <dbl> <dbl>    <dbl>    <dbl> <lgl> <lgl> <lgl> <lgl> <lgl>          <dbl>    <dbl>     <int> <chr>                           
    1 A        47      150    66      199    0.324 TRUE  TRUE  TRUE  TRUE  TRUE          0.0608 8.05e- 1         1 Pearson's Chi-squared test with…
    2 B         8      150   183      199    0.547 TRUE  TRUE  TRUE  TRUE  TRUE        256.     1.59e-57         1 Pearson's Chi-squared test with…
    3 C        93      150    44      199    0.393 TRUE  TRUE  TRUE  TRUE  TRUE         55.4    9.77e-14         1 Pearson's Chi-squared test with…
    4 D        73      150   113      199    0.533 TRUE  TRUE  TRUE  TRUE  TRUE          1.95   1.63e- 1         1 Pearson's Chi-squared test with…
    

    It may be faster with pmap instead of rowwise

    library(purrr)
    df %>% 
       mutate(out = pmap(across(c(sum_m, sum_w, length_m, length_w, final_test)), 
         ~ f1(..1, ..2, ..3, ..4, ..5) %>% 
                broom::tidy(.))) %>%
       unnest(out)
    

    -output

    # A tibble: 4 × 15
      as    sum_m length_m sum_w length_w pooled_p test1 test2 test3 test4 final_test statistic  p.value parameter method                          
      <chr> <dbl>    <dbl> <dbl>    <dbl>    <dbl> <lgl> <lgl> <lgl> <lgl> <lgl>          <dbl>    <dbl>     <int> <chr>                           
    1 A        47      150    66      199    0.324 TRUE  TRUE  TRUE  TRUE  TRUE          0.0608 8.05e- 1         1 Pearson's Chi-squared test with…
    2 B         8      150   183      199    0.547 TRUE  TRUE  TRUE  TRUE  TRUE        256.     1.59e-57         1 Pearson's Chi-squared test with…
    3 C        93      150    44      199    0.393 TRUE  TRUE  TRUE  TRUE  TRUE         55.4    9.77e-14         1 Pearson's Chi-squared test with…
    4 D        73      150   113      199    0.533 TRUE  TRUE  TRUE  TRUE  TRUE          1.95   1.63e- 1         1 Pearson's Chi-squared test with…
    

    -function

    f1 <- function(sum_m, sum_w, length_m, length_w, final_test) {
     
     x    <- c(sum_m, sum_w)
     n    <- c(length_m, length_w)
     mash <- rbind(c(sum_m, length_m - sum_m),
                   c(sum_w, length_w - sum_w))
    
    
     if(final_test == TRUE){
      
       ## With Yate's continuity correction
      
       prop.test(x,n)
       #Exactly the same as:
       chisq.test(mash)
      
     }else{
      
       # Fisher's exact test
       fisher.test(mash)
      
     }
     
     
     
     
     }