performance optimization basic loop-unrolling

Loop unrolling in BASIC

I have an embedded processor that is running a trimmed down version of BASIC (Parallax BASIC Stamp). In a loop, I am writing 1024 values via the SPI bus.

In compiled languages, more speed can be gained by unrolling the loop (putting more statements into the loop, reducing the overhead to statement ratio). However, I am not sure about BASIC since it is an interpretive language and each statement is interpreted before it is executed.

Profiling is difficult since I have to find an available pin, write a pulse to it, then measure with an o'scope.

From a theory point of view, does loop unrolling in BASIC provide any speed benefits?

Solution

In theory, loop unrolling reduces the amount of time spent on incrementing and comparison within the loop. By virtue of reducing the loop overhead time, there is a performance gain.

The amount of time gained may not be as significant on an interpreted program as a compiled program. There is an overhead of time required for the interpreter to fetch an instruction, interpret (build code) and execute the code for a statement. In order for the loop unrolling time savings to be significant, the time savings must be larger than this overhead.

Unlike microprocessors, interpreters may not be optimized for execution speed. Modern processors have high speed caches, branch prediction and look-ahead techniques. Some can even fetch new instructions into the cache as others are executed. Loop unrolling takes advantage of these features by reducing the number of jumps and making the execution more predictable. For compiled languages, this adds a significant savings (for large iterations). This performance time savings may not be applicable to most interpreters since they may not employ these features.

The best determination of performance improvement is through measurement. In my case, there has to be enough user complaint to justify the schedule hit for performing the measurement.