This is most likely a very simple question but I'll ask it nevertheless since I haven't found an answer. How can I compare the amount of "cases" (for example flu) in two groups i.e. find out if the difference between the amounts of cases in the groups is statistically significant? Can I apply some sort of t-test? Or is it even meaningful to do this kind of a comparison?
I'd preferably do the comparison in R.
A very simple data example:
group1 <- 1000 # size of group 1
group2 <- 1000 # size of group 2
group1_cases <- 550 # the amount of cases in group 1
group2_cases <- 70 # the amount of cases in group 2
I think a chisq.test
is what you are looking for.
group1 <- 1000 # size of group 1
group2 <- 1000 # size of group 2
group1_cases <- 550 # the amount of cases in group 1
group2_cases <- 70 # the amount of cases in group 2
group1_noncases <- 1000 - group1_cases
group2_noncases <- 1000 - group2_cases
M <- as.table(rbind(c(group1_cases, group1_noncases),
c(group2_cases, group2_noncases)))
dimnames(M) <- list(groups = c("1", "2"),
cases = c("yes","no"))
res <- chisq.test(M)
# The Null, that the two groups are equal, has to be rejected:
res
#>
#> Pearson's Chi-squared test with Yates' continuity correction
#>
#> data: M
#> X-squared = 536.33, df = 1, p-value < 2.2e-16
# if both groups were equal then this would be the expected values:
res$expected
#> cases
#> groups yes no
#> 1 310 690
#> 2 310 690
Created on 2021-04-28 by the reprex package (v0.3.0)
Statistically a t.test
would not be the correct method. However, people use it for this kind of test and in most cases the p values are very simillar.
# t test
dat <- data.frame(groups = c(rep("1", 1000), rep("2", 1000)),
values = c(rep(1, group1_cases),
rep(0, group1_noncases),
rep(1, group2_cases),
rep(0, group2_noncases)))
t.test(dat$values ~ dat$groups)
#>
#> Welch Two Sample t-test
#>
#> data: dat$values by dat$groups
#> t = 27.135, df = 1490.5, p-value < 2.2e-16
#> alternative hypothesis: true difference in means is not equal to 0
#> 95 percent confidence interval:
#> 0.4453013 0.5146987
#> sample estimates:
#> mean in group 1 mean in group 2
#> 0.55 0.07
Created on 2021-04-28 by the reprex package (v0.3.0)