How to maximize performance of native R script (that will be run thousands of times)?

I'm trying to do a brute-force head-to-head comparison of several statistical tests on the same simulated datasets. I'm going to generate several thousand 'control' and several thousand 'experimental' populations and run the same tests on each set. The wrapper that calls the tests is going to be called thousands of times.

My questions are:

Is the below plan for doing this a good one?
Can you think of any way to squeeze more speed out of it (without rewriting native R functions, though I am okay with copying a subset of their internal code and running just that instead of the whole function)?

The Plan

I already have the simulated populations, and will use the appropriate apply function to pass the control and corresponding experimental observations to the wrapper.

The wrapper will have no arguments other than the control and experimental observations (let's call them xx and yy). Everything else will be hardcoded within the wrapper in order to avoid as much as possible the overhead of flow control logic and copying data between environments.

Each function to be called will be on a separate line, in a consistent format, in order of dependency (in the sense that, for example, cox.zph depends on there already existing a coxph object, so coxph() will be called earlier than cox.zph()). The functions will be wrapped in try() and if a function fails, the output and the functions that depend on it first test whether the object it returned has try-error as its first class and if it does, some kind of placeholder value.

The block of called functions will be followed by a long c() statement with each item extracted from the respective fit objects on a separate line. Here too, if the source object turns out to be try-error or a placeholder, put an NA in that output slot.

This way, the whole run isn't aborted if some of the functions fail, and the output from each simulation is a numeric vector of the same length, suitable for capturing to a matrix.

Depending on the goals of a given set of simulations, I can comment out or insert additional tests and results as needed.

A Few More Specific Followup Questions

If I'm already using compilePKGS(T) and enableJIT(3) (from the built-in compiler library), is there anything further to be gained by manually running compile() or cmpfun() on my wrapper function and the interpreted functions it calls?
Does anybody have any guidance on choosing the best enableJIT() value, or if I don't care about startup time, is "the more, the better"?
If each simulation is a new random variable, I have nothing to gain from memoizing, right?
For long-running tasks I like to have the inner function check to see if there exists a file of a given name and if so, source it into its environment. This allows me regain control of the session on the fly to fix problems, run browser(), save out internal objects, etc. without having to abort the whole run. But, I imagine that pinging the file system that often will start to add up. Is there a consensus on the most efficient way to communicate a boolean value (i.e. source the debug script or don't) to a running R process (under Linux)?

Thanks.

Solution

This will likely only address parts of your questions. I had luck speeding up processes by avoiding the apply function as well. apply is not vectorized and actually takes quite a bit of time. I saw gains using nested ifelse() statements.

Have you tried Rprof()? It was useful in my case in identifying slow elements of my code. Not a solution per se but a useful diagnostic.