prolog benchmarking microbenchmark sicstus-prolog

Dimensions of micro-benchmarking in Prolog

I want to micro-benchmark predicate int_cntA/2 ...

int_cntA(I,W) :- I >= 0, int_cntA0_cntA(I,0,W).

int_cntA0_cntA(0,W0,W) :- !, W0 = W.
int_cntA0_cntA(I,W0,W) :- I0 is I/\(I-1), W1 is W0+1,      int_cntA0_cntA(I0,W1,W).

... against predicate int_cntB/2:

int_cntB(I,W) :- I >= 0, int_cntB0_cntB(I,0,W).

int_cntB0_cntB(0,W0,W) :- !, W0 = W.
int_cntB0_cntB(I,W0,W) :- I0 is I>>1,     W1 is W0+(I/\1), int_cntB0_cntB(I0,W1,W).

I'm not 100% sure about what I need to consider to get good results... What are even interesting dimensions?

So far, I came up with: Should I include meta-call performance into the benchmark or should it be about raw number crunching? Should the loop be failure driven or not? Should I care about the garbage generated during execution or not?

The following code snippet is a simple benchmark implementation that goes for raw performance, is failure driven and does (thus) not care about garbage:

:- use_module(library(between)).

rep_10.
rep_10.
rep_10.
rep_10.
rep_10.
rep_10.
rep_10.
rep_10.
rep_10.
rep_10.

rep_100M :- rep_10, rep_10, rep_10, rep_10, rep_10, rep_10, rep_10, rep_10.

Code for int_cntA/2:

benchA_1(I,W,Rt) :- statistics(runtime,_),
                    ( repeat(100000000),      int_cntA(I,W), false ; true ),
                    statistics(runtime,[_,Rt]),
                    int_cntA(I,W).

benchA_2(I,W,Rt) :- statistics(runtime,_),
                    ( between(1,100000000,_), int_cntA(I,W), false ; true ),
                    statistics(runtime,[_,Rt]),
                    int_cntA(I,W).

benchA_3(I,W,Rt) :- statistics(runtime,_),
                    ( rep_100M,               int_cntA(I,W), false ; true ),
                    statistics(runtime,[_,Rt]),
                    int_cntA(I,W).

Code for int_cntB/2:

benchB_1(I,W,Rt) :- statistics(runtime,_),
                    ( repeat(100000000),      int_cntB(I,W), false ; true ),
                    statistics(runtime,[_,Rt]),
                    int_cntB(I,W).

benchB_2(I,W,Rt) :- statistics(runtime,_),
                    ( between(1,100000000,_), int_cntB(I,W), false ; true ),
                    statistics(runtime,[_,Rt]),
                    int_cntB(I,W).

benchB_3(I,W,Rt) :- statistics(runtime,_),
                    ( rep_100M,               int_cntB(I,W), false ; true ),
                    statistics(runtime,[_,Rt]),
                    int_cntB(I,W).

On an Intel Core i7 Haswell machine running SICStus Prolog 4.3.1 worst-case performance differences due to the different benchmarking methods (A,B,C) exceed 100%:

| ?- benchA_1(0,W,Rt).
W = 0,
Rt = 3140 ? 
yes
| ?- benchA_2(0,W,Rt).
W = 0,
Rt = 4130 ? 
yes
| ?- benchA_3(0,W,Rt).
W = 0,
Rt = 1960 ? 
yes

Do you have ideas if/how I could further reduce the overhead of the micro-benchmark? Thank you!

Solution

You have to be careful about cold and warm runs. Modern Prolog systems have just in time index or even compilation. So that different runs can behave differently. Also garbage collection makes timing varying.

I always measure warm runs, so I let run the benchmark first at least one time and and throw away the measure. And then I run it again and take notes of the measure.

Converning different loops. The usual approach is to measure the loop overhead by inserting a dummy predicate into the loop, and measuring the loop time. And then later subtract this time. The dummy predicate can be easily defined as:

  dummy.

If you see that the looping everhead is small, I think it is also valid to not go into the lengths of subtracting the loop overhead and clearly state that your measurements also measure the loop.

When you use the same loop you can still compare the results for different test object that your harness runs on, even if you don't subtract.