parallel-processing benchmarking parallelism-amdahl

Measure parallel speedup in randomized-algorithms

I have a randomized program with sequential and parallel variants. The nature of that program is that its run-time varies drastically depending on its "luck". It regularly takes values between 1sec and 2 min in a seemingly geomentric-distribution-ish pattern. Parallel variants show a similar behaviour with different numbers.

What is a "good" way measure parallel speedup in this case? I have the possibility to just use the mean/median of measured values as a representative for "the run-time"

How would I explain such an approach and is there a (statistically/mathematically) better way to calculate the speedup?

EDIT: Thanks to user3666197, which noted some very important technical details necessary to get good data. I have done that homework and want to clarify my question.

I made my benchmark process as reliable as I could get it:

the benchmarks are run with seeds, with which the results are reproducible.
every configuration is repeated multiple times (~400 times) with different seeds inside a script

My question remains: How to approach the calculation of the speedup for this program.

What I have done:

Mean sequential run-time is about 8.38, median is 4.8, which is a big difference. For 2 threads, mean run-time is 4.36 while median run-time is 2.42. If I divide sequential by parallel I get speedups of 1.92 (means) and 1.992 (medians). For 4 threads similar: means: 2.25 run-time and 3.72 speedups, medians: 1.12 median and 4.3 speedup (superlinear). Similar numbers exist for 8 threads.

I try to visualize the data in different ways. Plots

The histogram shows the distribtion of run-times using various threads, as does the boxplot to its right. It is visible that some speedup is visible.

If I pair the measurements based on seeds, I get pairs of times: sequential and parallel times. One of my first ideas was to calculate the speedup by calculating the slope of the regression-line, however, it seems that the regression-line does not properly "summarize" the data and has limited value. In the bottom-right plot, only the points for 4 threads are shown.

Solution

I would recommend you compute the speedup based on the arithmetic mean of runtimes of a sufficiently large set of measurements. Make sure to properly communicate what the numbers represent. It can be difficult to ensure that you have a large enough set up measurements to compute a proper mean with a certain confidence, especially since your samples are not normally distributed. Include your findings about distribution and confindence. Make sure to summarize the runtimes first, before computing the speedup.

There is an excellent paper by Torsten Hoefler and Roberto Belli which covers your issues in detail. Particularly Sections 2.1.1 and 3.