Rust pyo3 function is faster in python when ran multiple times instead of once

I implemented an algorithm in rust for speed, which is then built into a python module. Running the function is indeed much faster than the python implementation. But I noticed an interesting quirk: Running the function a lot of times (say, 1 million) is on average much faster than just running the function once or a few times.

print(timeit.timeit(lambda: blob(9999, 16, (3, 5), (4, 11), (2, 4)), number=1))
print(timeit.timeit(lambda: blob(9999, 16, (3, 5), (4, 11), (2, 4)), number=1000000) / 1000000)

Output:

1.5100000382517464e-05
2.1137116999998398e-06

As you can see, the function being executed 1000000 times is on average, is about 7 times faster than only running it once.

Any idea why this is happening? Any help would be appreciated.

If the code of the rust function is needed to pinpoint the problem, just send a comment and I'll put it here :)

Solution

The easiest method to measure something (forget about cache issues) it's to measure time diffs in some unit time (smaller unit, more precision).

let start = Instant::now();
f();
let duration = start.elapsed();

But, as others pointed out, cache happens at multiple levels. When you run the same execution multiple times, there's always a window for the processes that are dedicated to live optizimations to cache data, and provide faster execution times.

Note how pyo3 finally ends interacting with C code in the Python's side, which already has optimizations (even intermediate compiled units, like .pyc files) at really lower levels to boost processes that are execution intense (specially repetibles ones) applying different techniques, and of course, caching techniques to save time and space.