Search code examples
rtestingdplyrchi-squared

Chi-square tests for different groups in a R dataframe


I have a huge dataframe with the following basic structure:

data <- data.frame(species = factor(c(rep("species1", 4), rep("species2", 4), rep("species3", 4))),
                 trap = c(rep(c("A","B","C","D"), 3)),
                 count=c(6,3,7,9,5,3,6,6,5,8,1,3))
data

I want simultaneously chi-square tests for the species counting data between the four traps for each individually species, but not between them. It could be solved with the following code for each individually species, but because of my huge original dataframe it is not a suitable solution for me.

chi_species1 <- xtabs(count~trap, data, 
                       subset = species=="species1")
chi_species1
chisq.test(chi_species1)

Thanks for your help!!


Solution

  • base

    df <- data.frame(species = factor(c(rep("species1", 4), rep("species2", 4), rep("species3", 4))),
                       trap = c(rep(c("A","B","C","D"), 3)),
                       count=c(6,3,7,9,5,3,6,6,5,8,1,3))
    df
    #>     species trap count
    #> 1  species1    A     6
    #> 2  species1    B     3
    #> 3  species1    C     7
    #> 4  species1    D     9
    #> 5  species2    A     5
    #> 6  species2    B     3
    #> 7  species2    C     6
    #> 8  species2    D     6
    #> 9  species3    A     5
    #> 10 species3    B     8
    #> 11 species3    C     1
    #> 12 species3    D     3
    
    species <- unique(df$species)
    
    chi_species <- lapply(species, function(x) xtabs(count~trap, df, 
                          subset = species== x))
    
    chi_species <- setNames(chi_species, species)
    
    lapply(chi_species, chisq.test)
    
    #> $species1
    #> 
    #>  Chi-squared test for given probabilities
    #> 
    #> data:  X[[i]]
    #> X-squared = 3, df = 3, p-value = 0.3916
    #> 
    #> 
    #> $species2
    #> 
    #>  Chi-squared test for given probabilities
    #> 
    #> data:  X[[i]]
    #> X-squared = 1.2, df = 3, p-value = 0.753
    #> 
    #> 
    #> $species3
    #> 
    #>  Chi-squared test for given probabilities
    #> 
    #> data:  X[[i]]
    #> X-squared = 6.2941, df = 3, p-value = 0.09815
    

    Created on 2022-04-25 by the reprex package (v2.0.1)

    tidyverse

    df %>% 
      group_by(species, trap) %>% 
      summarise(count = sum(count)) %>% 
      summarise(pvalue= chisq.test(count)$p.value) 
    
    # A tibble: 3 × 2
      species  pvalue
      <fct>     <dbl>
    1 species1 0.392 
    2 species2 0.753 
    3 species3 0.0981