Search code examples
rdplyrhypothesis-test

Keep multiple values of chisq.test in summarised tibble


I have grouped data I'm performing a chi-squared test on and would like to returned a summary table that includes multiple values from the htest object. For example (from a previous question),

library(dplyr)

set.seed(1)
foo <- data.frame(
  partido=sample(c("PRI", "PAN"), 100, 0.6),
  genero=sample(c("H", "M"), 100, 0.7), 
  GM=sample(c("Bajo", "Muy bajo"), 100, 0.8)
)

foo %>% 
  group_by(GM) %>% 
  summarise(p.value=chisq.test(partido, genero)$p.value))

returns the p-value, but instead I would like multiple values (say p.value and statistic) from the htest object to be returned as different columns in the summary table.

I've tried

foo %>%
  group_by(GM) %>%
  summarise(htest=chisq.test(partido, genero)) %>%
  mutate(p.value=htest$p.value, statistic=htest$statistic)

but that throws an error

Error in summarise_impl(.data, dots) :
Column htest must be length 1 (a summary value), not 9

How do you accomplish this with the tidyverse tools?


Solution

  • Another option is to make use of broom::tidy

    library(broom)
    library(tidyverse)
    foo %>%
        group_by(GM) %>%
        nest() %>%
        transmute(
            GM,
            res = map(data, ~tidy(chisq.test(.x$partido, .x$genero)))) %>%
        unnest()
    ## A tibble: 2 x 5
    #  GM      statistic p.value parameter method
    #  <fct>       <dbl>   <dbl>     <int> <chr>
    #1 Bajo       0.0157   0.900         1 Pearson's Chi-squared test with Yates' c…
    #2 Muy ba…    0.504    0.478         1 Pearson's Chi-squared test with Yates' c…