I am reading about χ^2 test and have a contingency table for observed values, and I'd like to calculate "adjusted residuals" according to this guide. I've written the following code and it does the job but I want to avoid the loop at the bottom. I'm sure there is a way, I just can't seem to get it:
import numpy as np
from scipy import stats
O = np.array([[21, 6, 12, 19],[20, 4, 15, 3],
[9,5, 18, 22],[2, 6, 19, 19]])
chi2, p, dof, E = stats.chi2_contingency(O)
# Adjusted residuals here
def res(o, e, rsum, csum):
return (o - e) / np.sqrt( e * (1- (e/rsum)) * (1 - (e/csum)) )
residuals = np.zeros(O.shape)
for r in range(O.shape[0]):
for c in range(O.shape[1]):
rsum = O[r].sum()
csum = O[:, c].sum()
residual = res(O[r][c], E[r][c], rsum, csum)
residuals[r][c] = residual
print(residuals)
Out[430]:
array([[ 2.10317786, -0.04575022, -2.19144827, 0.24489455],
[ 3.59372734, -0.23218785, 0.58057317, -3.82328682],
[-1.83007839, -0.34810505, 0.24583558, 1.71096716],
[-3.81533368, 0.6412914 , 1.54166511, 1.6313671 ]])
Is there some way that I can do this elegantly without the loop?
My solution:
import numpy as np
O = np.array([[1, 1, 1, 1], [2, 2, 2, 2], [3, 3, 3, 3], [4, 4, 4, 4]])
E = np.array([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12], [13, 14, 15, 16]])
o_sum_0 = O.sum(axis=0) # [10,10,10,10]
o_sum_1 = O.sum(axis=1) # [4,8,12,16]
# [4,8,12,16] > (full) >
# [[4,8,12,16],[4,8,12,16],[4,8,12,16],[4,8,12,16]] > (transpose) >
# [[4,4,4,4], [8,8,8,8], [12,12,12,12], [16,16,16,16]]
rsum = np.full(O.shape, o_sum_1).T
# [10,10,10,10] > (full) >
# [[10,10,10,10],[10,10,10,10],[10,10,10,10],[10,10,10,10]]
csum = np.full(O.shape, o_sum_0)
residual = (O - E) / np.sqrt(E * (1 - (E / rsum)) * (1 - (E / csum)))
or with one "simple" line:
residuals = r(O - E) / np.sqrt(E * (1 - (E / np.full(O.shape, O.sum(axis=1)).T)) * (1 - (E / np.full(O.shape, O.sum(axis=0)))))