I'm working with non numeric data that looks something like this:
Origin | ESBL |
---|---|
Hospital | ESBL |
Hospital | Non-ESBL |
Hospital | ESBL |
City | ESBL |
Hospital | Non-ESBL |
City | ESBL |
Country | ESBL |
Hospital | ESBL |
And I want to compare if there is a statistical association between the origin and the variable ESBL.
So far I have tried generating a contingency table in R using:
cont_tab<-table(data$Origin, data$ESBL)
and the running a chi squared test for independence:
chi_test<-chisq.test(cont_tab)
After this, I get that there is indeed independency:
X-squared = 17.306, df = 2, p-value = 0.0001746
But now I want to know which are the combinations that are responsible for this values (ESBL-Hospital, Non-ESBL-Hospital, ESBL-City and so on).
I have tried running multiple Fisher tests:
Library(RVAideMemoire)
multifish<-fisher.multcomp(cont_tab)
But I don't really get what I want:
ESBL Non-ESBL
Hospital 46 122
City 27 21
Country 56 69
Am I doing anything wrong? Is there a better approach for this?
Thanks!!!
I think the "final result" you are showing is actually cont_tab
. When I run your code, cont_tab
looks like the result you are showing as being the output from fisher.multicomp
:
cont_tab <- table(data$Origin, data$ESBL)
cont_tab
#>
#> ESBL Non-ESBL
#> Hospital 46 122
#> City 27 21
#> Country 56 69
Whereas, if I run fisher.multcomp
on cont_tab
, I get:
library(RVAideMemoire)
fisher.multcomp(cont_tab)
#>
#> Pairwise comparisons using Fisher's exact test for count data
#>
#> data: cont_tab
#>
#> Hospital City
#> City 0.001313 -
#> Country 0.004249 0.234
#>
#> P value adjustment method: fdr
We can see in it (as expected) that Hospital
is significantly different from both City
and Country
, but there is no significant difference between City
and Country
.
Created on 2022-12-13 with reprex v2.0.2
Data inferred from question
data <- data.frame(
ESBL = factor(c(rep(c("ESBL", "Non-ESBL"), times = c(46, 122)),
rep(c("ESBL", "Non-ESBL"), times = c(27, 21)),
rep(c("ESBL", 'Non-ESBL'), times = c(56, 69)))),
Origin = factor(rep(c('Hospital', 'City', 'Country'),
times = c(168, 48, 125)),
c('Hospital', 'City', 'Country')))