Search code examples
stata

cycling Ranksum on Stata


I have some data with two different groupd of patients automatically exported from a diagnostic tool.

Variables are automatically nominated by the diagnostic tool (e.g. L1DensityWholeImage, L1WholeImageSHemi, L1WholeImageIHemi , L1WholeETDRS ,[...], DeepL2StartLayer, L2Startoffsetum, L2EndLayer, [...], Perimeter, AcircularityIndex )

I have to perform a Rank-sum test (or Mann-Whitney U test) with all the variables (> of 80) by group.

Normally, I should write each single analysis like that:

ranksum L1DensityWholeImage, by(Group)

ranksum L1WholeImageSHemi, by(Group)

ranksum L1WholeImageIHemi, by(Group)

ranksum L1WholeETDRS, by(Group)

Is there any way or code to write the command with a varlist? And maybe to obtain only 1 output result with all the p value?

e.g.: ranksum L1DensityWholeImage L1WholeImageSHemi L1WholeImageIHemi L1WholeETDRS, DeepL2StartLayer L2Startoffsetum L2EndLayer Perimeter AcircularityIndex, by(Group)


Solution

  • A short answer is write a loop and customise output.

    Here is a token example which you can run.

    sysuse auto, clear 
    
    foreach v of var mpg price weight length displacement { 
        quietly ranksum `v', by(foreign) porder 
        scalar pval = 2*normprob(-abs(r(z)))
        di "`v'{col 14}" %05.3f pval " " %6.4e pval  "   " %05.3f r(porder) 
    } 
    

    Output is

    mpg          0.002  1.9e-03   0.271
    price        0.298  3.0e-01   0.423
    weight       0.000  3.8e-07   0.875
    length       0.000  9.4e-07   0.862
    displacement 0.000  1.1e-08   0.921
    

    Notes:

    1. If your variable names are longer, they will need more space.

    2. Displaying P-values with fixed numbers of decimal places won't prepare you for the circumstance in which all displayed digits are zero. The code exemplifies two forms of output.

    3. The probability that values for the first group exceed those for the second group is very helpful in interpretation. Further summary statistics could be added.

    4. Naturally a presentable table needs more header lines, best given with display.