I am doing experiments on human subjects where they have to perform a piloting task. Their performance is rated by measuring the distance between their trajectory and the center of targets in the sky. I have 8 subjects for the test condition and 8 subjects for the control condition. My data are non-parametric and independent. I am analyzing the statistical difference with a Wilcoxon Rank Sum test in Matlab (ranksum).
However, I would like to know what is the power of my test and how many subjects I would need to have a given power. As my data don’t follow a normal distribution (non-parametric), I cannot use the Matlab function sampsizepwr.
I found in some studies that we could do a bootstrap analysis to obtain a plot of the power regarding to the number of subjects. Matlab has the function bootstrap but I don't understand how to implement it. Does anyone know how to use bootstrap with Matlab to compute the power as a function of the sample size for non-parametric and independent data?
Thank you for your help!
Carine
Assuming you have 2 samples g1
, g2
, you would run:
p = ranksum(g1,g2)
N = 100;
bootstat = bootstrp(N,@ranksum,g1,g2);
Where N
is the number of sampling tries. bootstat
is a list of the ranksum
statistic p-value for all N
samples. The power of the test is the probability that your test reject H0 in those samples:
pow = sum(bootstat<alpha)/N
where alpha
is the significance level (typically 5%).
Edit:
To make things more specific, I simulate some data that have similar properties as yours:
smpsz = [4 8 20 200]; % sample size
for n = 1:numel(smpsz)
g1 = rand(1,smpsz(n)).*repelem([3.36 6.72],smpsz(n)/2)+repelem([0 3.36],smpsz(n)/2);
g2 = rand(1,smpsz(n)).*repelem([8.13 8.13*2],smpsz(n)/2)+repelem([0 8.13],smpsz(n)/2);
N = [5 10 50 100 500]; % no. of tries
for k = 1:numel(N)
bootstat2 = bootstrp(N(k),@ranksum,g2,g1);
bootstat1 = bootstrp(N(k),@ranksum,g1,g2);
pow(1,k) = sum(bootstat1<0.05)/N(k);
pow(2,k) = sum(bootstat2<0.05)/N(k);
end
disp(smpsz(n))
disp(pow)
end
The result:
4
0.2 0.2 0.02 0.05 0.058
0 0.2 0.06 0.05 0.072
8
0.6 0.3 0.7 0.66 0.552
0.6 0.5 0.56 0.58 0.6
20
1 0.8 0.9 0.86 0.908
1 0.7 0.86 0.88 0.898
200
1 1 1 1 1
1 1 1 1 1
you can see above that pow
does not get better (higher) along the rows, i.e. with a greater N
, and also here is no significant difference between every two adjacent rows, i.e. the order of groups does not matter. However, it does increase as sample size (smpsz
) increases from 4 to 200.