Introduction -->
I am trying to reproduce the same results in R and in Matlab, of the chi-squared test on contingency tables, for two observed frequency datasets, x
and y
(therefore, x
and y
are my inputs). In particular, I am focusing on the special case where x=y
.
In R, I get the following:
> x = c(1000,100,10,1)
> y = c(1000,100,10,1)
> chisq.test(x,y)
Pearson's Chi-squared test
data: x and y
X-squared = 12, df = 9, p-value = 0.2133
In Matlab, I get the following:
>> x = [1000 100 10 1];
>> y = [1000 100 10 1];
>> [~,chi2,p] = crosstab(x,y)
chi2 =
12
p =
0.21331
So far so god. However, intuitively, if I perform a chi-squared test for the same observed frequency datasets (i.e. for the x=y
case), I would expect to get a test statistic equal to 0 and a p-value equal to 1. This can be achieved in R as follows:
> x = c(1000,100,10,1)
> y = c(1000,100,10,1)
> chisq.test(cbind(x, y))
Pearson's Chi-squared test
data: cbind(x, y)
X-squared = 0, df = 3, p-value = 1
Question -->
How can I get the same in Matlab, i.e. a test statistic equal to 0 and a p-value equal to 1 for the x=y
case?
Additional note -->
If it can be useful, I have tried to play a bit with the Matlab code, and I was able to reproduce the [~,chi2,p] = crosstab(x,y)
(but obviously not the equivalent of the R code for chisq.test(cbind(x, y))
!):
% Inputs
x = [1000 100 10 1];
y = [1000 100 10 1];
% Chi-squared test on contingency tables
[unique_x,ix,jx] = unique(x);
[unique_y,iy,jy] = unique(y);
OT = zeros(numel(unique_x),numel(unique_y));
for i = 1:numel(x)
OT(jx(i),jy(i)) = OT(jx(i),jy(i)) + 1;
end
R = sum(OT,2);
C = sum(OT);
n = sum(sum(OT));
ET = (R*C)./n;
chi2 = (OT - ET).^ 2 ./ ET;
chi2 = sum(chi2(:)) % <-- test statistic
df = (numel(R)-1)*(numel(C)-1) % <-- degrees of freedom
p = gammainc(chi2/2,df/2,'upper') % <-- p-value
OT % <-- OT = Observed frequency contingency table
ET % <-- ET = Expected frequency contingency table
Which returns the following:
chi2 =
12
df =
9
p =
0.21331
OT =
1 0 0 0
0 1 0 0
0 0 1 0
0 0 0 1
ET =
0.25 0.25 0.25 0.25
0.25 0.25 0.25 0.25
0.25 0.25 0.25 0.25
0.25 0.25 0.25 0.25
To replicate the results of chisq.test(cbind(x, y))
where the test statistic is 0
and the p-value is 1
for identical datasets. You can create a custom implementation that allows you to construct the contingency table directly, as you did in R.
Once you define your inputs:
% Inputs
x = [1000 100 10 1];
y = [1000 100 10 1];
You can do it in MATLAB via the script:
% Combine x and y into a 2-column matrix
observed = [x(:) y(:)];
% Calculate the row and column totals
row_totals = sum(observed, 2);
col_totals = sum(observed, 1);
% Calculate the grand total
grand_total = sum(row_totals);
% Compute the expected frequency table
expected = (row_totals * col_totals) / grand_total;
% Calculate the chi-squared statistic
chi2 = sum((observed(:) - expected(:)).^2 ./ expected(:));
% Degrees of freedom
df = (size(observed, 1) - 1) * (size(observed, 2) - 1);
% Compute the p-value
p = 1 - chi2cdf(chi2, df);
% Display the results
disp(['Chi-squared statistic: ', num2str(chi2)])
disp(['Degrees of freedom: ', num2str(df)])
disp(['p-value: ', num2str(p)])
In this case the observed matrix is a 2-column matrix where each row contains two variables x
and y
. The expected frequency table is calculated based on the row and column totals.
The chi-squared statistic is calculated using the recorded and expected frequencies, the p-value is computed using the chi-squared cumulative distribution function.