Search code examples
rstatisticschi-squared

Which is the correct way to test for significant differences between non-numeric data? Which is the correct post-hoc?


I'm working with non numeric data that looks something like this:

Origin ESBL
Hospital ESBL
Hospital Non-ESBL
Hospital ESBL
City ESBL
Hospital Non-ESBL
City ESBL
Country ESBL
Hospital ESBL

And I want to compare if there is a statistical association between the origin and the variable ESBL.

So far I have tried generating a contingency table in R using:

cont_tab<-table(data$Origin, data$ESBL)

and the running a chi squared test for independence:

chi_test<-chisq.test(cont_tab)

After this, I get that there is indeed independency:

X-squared = 17.306, df = 2, p-value = 0.0001746

But now I want to know which are the combinations that are responsible for this values (ESBL-Hospital, Non-ESBL-Hospital, ESBL-City and so on).

I have tried running multiple Fisher tests:

Library(RVAideMemoire)
multifish<-fisher.multcomp(cont_tab)

But I don't really get what I want:

            ESBL Non-ESBL
  Hospital   46      122
  City       27       21
  Country    56       69

Am I doing anything wrong? Is there a better approach for this?

Thanks!!!


Solution

  • I think the "final result" you are showing is actually cont_tab. When I run your code, cont_tab looks like the result you are showing as being the output from fisher.multicomp :

    cont_tab <- table(data$Origin, data$ESBL)
    
    cont_tab
    #>           
    #>            ESBL Non-ESBL
    #>   Hospital   46      122
    #>   City       27       21
    #>   Country    56       69
    

    Whereas, if I run fisher.multcomp on cont_tab, I get:

    library(RVAideMemoire)
    
    fisher.multcomp(cont_tab)
    #> 
    #>         Pairwise comparisons using Fisher's exact test for count data
    #> 
    #> data:  cont_tab
    #> 
    #>         Hospital  City
    #> City    0.001313     -
    #> Country 0.004249 0.234
    #> 
    #> P value adjustment method: fdr
    

    We can see in it (as expected) that Hospital is significantly different from both City and Country, but there is no significant difference between City and Country.

    Created on 2022-12-13 with reprex v2.0.2


    Data inferred from question

    data <- data.frame(
      ESBL = factor(c(rep(c("ESBL", "Non-ESBL"), times = c(46, 122)),
                      rep(c("ESBL", "Non-ESBL"), times = c(27, 21)),
                      rep(c("ESBL", 'Non-ESBL'), times = c(56, 69)))),
      Origin = factor(rep(c('Hospital', 'City', 'Country'), 
                          times =  c(168, 48, 125)), 
                      c('Hospital', 'City', 'Country')))