Search code examples
rchi-squaredstatistical-test

R equivalent of immediate commands in Stata (i.e. tabi ..., chi2)


Apologies in advance for this probably very basic question, but I have been struggling to find out how to do this in R.

When reviewing papers, theses, etc, it is very useful to calculate p-vaules from aggregate data. I.e. you see a table and wonder if the p-value is correctly calculated. In Stata it is very easy to calculate for instance Chi Square test on aggregate data using the immediate commands, for example:

tabi 8 43 \ 2 78, row chi2

gives output

   row |         1          2 |     Total
     1 |         8         43 |        51 
       |     15.69      84.31 |    100.00 

     2 |         2         78 |        80 
       |      2.50      97.50 |    100.00 

 Total |        10        121 |       131 
       |      7.63      92.37 |    100.00 

      Pearson chi2(1) =   7.6805   Pr = 0.006

I struggle to do the same in R, using for instance chisq.test() I have tried, for instance,

chisq.test(c(8, 43, 2, 78))

or

chisq.test(c(8, 43, 2, 78, nrow = 2))

or similar, but it seems to do some completely different calculation...

Chi-squared test for given probabilities
data:  c(8, 43, 2, 78, nrow = 2)
X-squared = 167.94, df = 4, p-value < 2.2e-16

Can anyone help with a "quick-fix" for this?

Thanks in advance

Bjorn


Solution

  • I am not entirely sure what you want to achieve, but I think it is possible that you are searching for this(?):

    chisq.test(matrix(c(8, 43, 2, 78), nrow = 2))
    

    Anyways, just run ?chisq.test() to see how the function works, what arguments it expects and in which order, etc.

    If you run this, you'll also find a description of how the function works:

    "If x is a matrix with one row or column, or if x is a vector and y is not given, then a goodness-of-fit test is performed (x is treated as a one-dimensional contingency table). The entries of x must be non-negative integers. In this case, the hypothesis tested is whether the population probabilities equal those in p, or are all equal if p is not given.

    If x is a matrix with at least two rows and columns, it is taken as a two-dimensional contingency table: the entries of x must be non-negative integers. Otherwise, x and y must be vectors or factors of the same length; cases with missing values are removed, the objects are coerced to factors, and the contingency table is computed from these. Then Pearson's chi-squared test is performed of the null hypothesis that the joint distribution of the cell counts in a 2-dimensional contingency table is the product of the row and column marginals."

    Check your example data, e.g. when you run

    is.matrix(c(8, 43, 2, 78, nrow = 2))
    

    it will return

    [1] FALSE
    

    while

    is.matrix(matrix(c(8, 43, 2, 78), nrow = 2))
    

    returns

    [1] TRUE
    

    So you know that the example you gave was a vector. Now when you read the description of the function I pasted above, you'll find that it will try to perform a "goodness-of-fit test" with your vector. In case of a matrix, it will perform "Pearson's chi-squared test".