Search code examples
rwhile-looplogiccontrol-flow

Writing a nested while loop in R


I know there are numerous questions already about while loops in R, but I have looked though most of them and none seem to address this issue.

I am running a simulation study on a variable (vanq) that can't be accurately simulated. So, instead of randomly generating values of vanq for two groups and then testing the robustness of various tests, I am using a large dataset of vanq observations and randomly assigning groups to it (basically doing the same thing, but backwards). To do this properly, I need to generate groups that meet all of the following conditions:

  1. The mean vanq values of each group differ by less than 0.0001
  2. The median vanq values of each group differ by less than 0.0001 (optimally 0)
  3. The three tests that I am using all give p.values > 0.5

So far, the code I have is this:

#generate two random groups of equal size
mydata$X.5.NS = rbinom(nrow(mydata),1,0.5) 

while(

    #any of the tests give p.values less than 0.5
    min(
        t.test(mydata$vanq~mydata$X.5.NS, var.equal = TRUE)$p.value,
        t.test(mydata$vanq~mydata$X.5.NS, var.equal = FALSE)$p.value,
        wilcox.test(mydata$vanq~mydata$X.5.NS)$p.value) < 0.5 |

    # or the means differ by more than 0.0001
    abs(mean(mydata$vanq[ mydata$X.5.NS == 0]) - 
        mean(mydata$vanq[ mydata$X.5.NS == 1])) > 0.0001 | 

    #or the medians differ by more than 0
    abs(median(mydata$vanq[ mydata$X.5.NS == 0]) - 
        median(mydata$vanq[ mydata$X.5.NS == 1])) > 0
)

{
#re-assign the random groups
mydata$X.5.NS = rbinom(nrow(mydata),1,0.5)
}

However, it takes over an hour to meet these conditions because it takes about 12 seconds to get the p.values and a couple hundred tries to meet all the conditions. Normally I would just let it run, but I need to do this for three more groups, and then do the same procedure but until the means differ by at > 1 , medians differ by > 1, and all p.values are < 0.05, which takes considerably longer.

What I would like to do is something like this:

while(
#the means differ by more than 0.0001
    abs(mean(mydata$vanq[ mydata$X.5.NS == 0]) - 
        mean(mydata$vanq[ mydata$X.5.NS == 1])) > 0.0001 | 

    #or the medians differ by more than 0
    abs(median(mydata$vanq[ mydata$X.5.NS == 0]) - 
        median(mydata$vanq[ mydata$X.5.NS == 1])) > 0
)

{
#re-assign the random groups
mydata$X.5.NS = rbinom(nrow(mydata),1,0.5)
}

#once the above conditions have been met, then perform the tests,
if(min(
       t.test(mydata$vanq~mydata$X.5.NS, var.equal = TRUE)$p.value,
       t.test(mydata$vanq~mydata$X.5.NS, var.equal = FALSE)$p.value,
       wilcox.test(mydata$vanq~mydata$X.5.NS)$p.value) < 0.5) 
{
#if any of the p.values were > 0.5, go back to the top of the while loop    
}

The idea is that by only testing once the mean and median conditions have been met, I can speed this process up a lot. I have tried adding various other flow controls (if, break, next, etc.) without luck. What I really need is a go to line command, but that doesn't seem to exist in R. Any help is greatly appreciated.

Here's a flow chart of the process I'm trying to code.


Solution

  • I’m honestly not sure what your control flow is but maybe this is what you need?

    while (min(t.test(mydata$vanq~mydata$X.5.NS, var.equal = TRUE)$p.value,
               t.test(mydata$vanq~mydata$X.5.NS, var.equal = FALSE)$p.value,
               wilcox.test(mydata$vanq~mydata$X.5.NS)$p.value) < 0.5)) {
        while (
            # the means differ by more than 0.0001
            abs(mean(mydata$vanq[ mydata$X.5.NS == 0]) - 
            mean(mydata$vanq[ mydata$X.5.NS == 1])) > 0.0001 || 
    
            # or the medians differ by more than 0
            abs(median(mydata$vanq[ mydata$X.5.NS == 0]) - 
            median(mydata$vanq[ mydata$X.5.NS == 1])) > 0
        ) {
            # re-assign the random groups
            mydata$X.5.NS = rbinom(nrow(mydata), 1, 0.5)
        }
    }
    

    The whole thing can be made more readable by wrapping the tests in function calls:

    while (any_significant_p_value(mydata, alpha = 0.05)) {
        while (mean_difference(mydata) > 0.0001 || median_difference(mydata) > 0) {
            mydata = mydata$X.5.NS = rbinom(nrow(mydata), 1, 0.5)
        }
    }