Search code examples
matlabstatistics-bootstrap

How to implement the function bootstrp to obtain a plot of the power regarding the number of observations of a statistical test


I am doing experiments on human subjects where they have to perform a piloting task. Their performance is rated by measuring the distance between their trajectory and the center of targets in the sky. I have 8 subjects for the test condition and 8 subjects for the control condition. My data are non-parametric and independent. I am analyzing the statistical difference with a Wilcoxon Rank Sum test in Matlab (ranksum).

However, I would like to know what is the power of my test and how many subjects I would need to have a given power. As my data don’t follow a normal distribution (non-parametric), I cannot use the Matlab function sampsizepwr.

I found in some studies that we could do a bootstrap analysis to obtain a plot of the power regarding to the number of subjects. Matlab has the function bootstrap but I don't understand how to implement it. Does anyone know how to use bootstrap with Matlab to compute the power as a function of the sample size for non-parametric and independent data?

Thank you for your help!

Carine


Solution

  • Assuming you have 2 samples g1, g2, you would run:

    p = ranksum(g1,g2)
    N = 100;
    bootstat = bootstrp(N,@ranksum,g1,g2);
    

    Where N is the number of sampling tries. bootstat is a list of the ranksum statistic p-value for all N samples. The power of the test is the probability that your test reject H0 in those samples:

    pow = sum(bootstat<alpha)/N
    

    where alpha is the significance level (typically 5%).


    Edit:

    To make things more specific, I simulate some data that have similar properties as yours:

    smpsz = [4 8 20 200]; % sample size
    for n = 1:numel(smpsz)
        g1 = rand(1,smpsz(n)).*repelem([3.36 6.72],smpsz(n)/2)+repelem([0 3.36],smpsz(n)/2);
        g2 = rand(1,smpsz(n)).*repelem([8.13 8.13*2],smpsz(n)/2)+repelem([0 8.13],smpsz(n)/2);
        N = [5 10 50 100 500]; % no. of tries
        for k = 1:numel(N)
            bootstat2 = bootstrp(N(k),@ranksum,g2,g1);
            bootstat1 = bootstrp(N(k),@ranksum,g1,g2);
            pow(1,k) = sum(bootstat1<0.05)/N(k);
            pow(2,k) = sum(bootstat2<0.05)/N(k);
        end
        disp(smpsz(n))
        disp(pow)
    end
    

    The result:

     4
          0.2          0.2         0.02         0.05        0.058
            0          0.2         0.06         0.05        0.072
     8
          0.6          0.3          0.7         0.66        0.552
          0.6          0.5         0.56         0.58          0.6
    20
            1          0.8          0.9         0.86        0.908
            1          0.7         0.86         0.88        0.898
    200
     1     1     1     1     1
     1     1     1     1     1
    

    you can see above that pow does not get better (higher) along the rows, i.e. with a greater N, and also here is no significant difference between every two adjacent rows, i.e. the order of groups does not matter. However, it does increase as sample size (smpsz) increases from 4 to 200.