Search code examples
rstatisticsgoodness-of-fit

proper way of using chi-square test (GOF) in R


I am trying to determine if data from today is really different from yesterday in four categories.

My counted data is:

data <-data.frame(yesterday=c(10741, 1575, 174, 2),
              today = c(11987, 1705, 211, 2), 
              row.names = c("a", "b", "c", "unknown"))

> data
        yesterday today
a           10741 11987
b            1575  1705
c             174   211
unknown         2     2

so I test using chi-square from stats package this way:

stats::chisq.test(x = data$yesterday, y = data$today)

and the result is:

Pearson's Chi-squared test

data:  data$yesterday and data$today
X-squared = 12, df = 9, p-value = 0.2133

My issue is I assume this should be the same as:

stats::chisq.test(data)

But you can see the result is completely different.

Pearson's Chi-squared test

data:  data
X-squared = 1.3846, df = 3, p-value = 0.7092

so....which is the proper way of using this test to compare two samples from same data set?


Solution

  • I assume the problem lies in the fact that you are applying chisq.test in the first case on columns of a contingency table, whereas the function expects x and y to be factors. So, the version where you supply the contingency table should be the correct one, at least it corresponds to the example from the documentation