How to accurately predict loop execution duration?

I am writing a program that requires me to know the execution time of a function, given any number of loops. Upon fine tuning I discovered that the execution time is non-linear, and I would like to know why. I rewrote the code into something I could share on here, and ran simulations on it and got pretty much the same results. Here is the code:

void DummyLoop(long y)
{
    long counter = 0;
    long DummyArray[3];
    char loopArray;
    
    while(counter < y)
    {
        for(loopArray = 0; loopArray < 3; loopArray++)
        {
            DummyArray[x] = 10000/50; // Some int division operation
        }
        counter++;
    }
}

I measured the results using an internal timer of the MCU, and call the loop like this:

T0CON0bits.EN = 1; // Start timer
DummyLoop(10000);
T0CON0bits.EN = 0; // Stop timer

The results are then printed over UART once the timer is stopped. These are the times I measured executing the same function with a growing number of loops. All these results were measured individually for different loop count inputs.

I added a column in there multiplying the time it took for one loop by the amount of loops for that test, just to show how the results change. It does taper out after about 1000 loops, but why does the result change so much? I would have thought the time it takes to run one loop could be multiplied for any number? The MCU I am using is an 8-bit PIC18F47Q43 for reference, compiled with XC8.

Solution

You cannot simply multiply the number of loops with some constant, since you don't take the overhead of the rest of the code into account.

Instead, the correct formula is: time(loops) = loops * K1 + K2.

We have the times for 1 and for 2 loops. This gives us the constant K1.

K1 = time(2) - time(1) = 0.017625 ms - 0.0100625 ms = 0.0075625 ms

With this result we can calculate the constant K2.

K2 = time(1) - 1 * K1 = 0.0100625 ms - 1 * 0.0075625 ms = 0.0025 ms

Let's look at results again.

Loops	Measured	Calculated	Delta
1	0.0100625	0.0100625	0
2	0.017625	0.017625	0
3	0.0251875	0.0251875	0
10	0.078125	0.078125	0
50	0.380625	0.380625	0
100	0.75875	0.75875	0
500	3.78375	3.78375	0
1000	7.565	7.565	0
5000	37.814	37.815	0.001
10000	75.626	75.6275	0.0015

Looks good, doesn't it? The differences for the last two values are certainly rounding errors of the used output routine. Calculations with floats are commonly limited to about 6 non-zero digits.

Note 1:

K1 is the time for one loop:

Checking the outer loop condition (positively)
The inner for loop inside the outer loop
The dummy calculation and assignment

K2 is the time of the one-time overhead:

Start/stop the timer (details depend on the exact working of the timer)
Call and return of the function
Prologue and epilogue in the function
Quitting the outer loop

K1  K2
    x   T0CON0bits.EN = 1; // Start timer
    x   DummyLoop(10000);
    x   T0CON0bits.EN = 0; // Stop timer

    x   void DummyLoop(long y)
    x   {
    x       long counter = 0;
    x       long DummyArray[3];
    x       char loopArray;
    
x   x       while(counter < y)
x           {
x               for(loopArray = 0; loopArray < 3; loopArray++)
x               {
x                   DummyArray[x] = 10000/50; // Some int division operation
x               }
x               counter++;
x           }
    x   }

Note 2:

You can sum these times from the generated machine code.

Or if this is too complicated, use a port pin to toggle and an oscilloscope to measure the interval at this pin.