Search code examples
pythonpython-3.xscipychi-squared

Why does a χ² test in scipy return a lesser test statistic?


I'm calculating the chi2 test statistic of a small concurrency table:

obs = np.array([
    [652, 576],
    [1348, 924]
])

When I calculate it by hand, as illustrated on Wikipedia (Σ (Oᵢ - Eᵢ)² / Eᵢ) I reach the result ~12.660, however the scipy.stats.chi2_contingency function returns these results with another test statistic:

>>> scipy.stats.chi2_contingency(obs)
 (12.40676502094132, 0.00042778128638335943, 1, array([[  701.71428571,  526.28571429],
   [ 1298.28571429,   973.71428571]])) 

I've compared the expected frequencies of the results with mine and they turn out identical. Also by entering my data into an online calculator gives me results identical to my own (for example on http://www.socscistatistics.com/tests/chisquare2/default2.aspx).

What magic is this function doing to reduce the test statistic?


Solution

  • By default correction is True, meaning that Yates' correction for continuity is applied in the case that the degrees of freedom is 1 (as is the case here). If you set correction=False this won't happen and you'll get 12.660... as the test statistic:

    >>> scipy.stats.chi2_contingency(obs, correction=False)
    (12.660142450795965,
     0.00037353375362753034,
     1,
     array([[  701.71428571,   526.28571429],
            [ 1298.28571429,   973.71428571]])
    

    The documentation gives the following further information for the correction parameter and summarises Yates' correction:

    If True, and the degrees of freedom is 1, apply Yates’ correction for continuity. The effect of the correction is to adjust each observed value by 0.5 towards the corresponding expected value.