mpi time-complexity complexity-theory openmpi hpc

Are the Hockney model parameters functions of message size?

Using the Hockney model, transferring time is modeled by t(s) = α + βm, where α is the latency for each message, and β is the transfer time per byte (or reciprocal of network bandwidth).

But from some papers (like this paper), latency and transfer time are functions of message size. With several message sizes, these are neither constant nor linear!

If the Hockney model parameters are functions of message size, how can we predict collective communication time (eg: for broadcast, scatter, ...) with several message sizes?

Example: If the broadcast operation is performed by the Flat Tree algorithm, t(s)=(P-1)(α + βm). Because α and β are functions of message size, we cannot plot its curve by linear line, and we cannot predict operation time without model parameters which correspond to the message size. For instance, we cannot predict the operation time for a message size of 30 bytes if we have not measured model parameters which send and receive 30 byte messages.

Solution

In Hockney, α and β are properties of the network, independent of the message size. While the mentioned paper clearly states:

We altered Hockney model such that α and β are functions of message size.

I agree it is confusing that they do always simply refer their altered model as Hockney. The chart, in the paper also looks suspiciously as if "Latency" is actually the message transfer time. You might call this Latency as seen from the application. And "Bandwidth" is also the bandwidth as seen from the application. Consider 10^6 bytes / 65 MBytes/s = 1.5 * 10^4 us. I don't see any sense in using these values that both reflect the total message transfer time as additive individual networkparameters for Hockney. Unfortunately the paper does not explain how they actually derived the parameters from their point-to-point MPI benchmark.

It is also noteworthy that the paper almost always simply uses the full term for message transfer time α(ms) + ms · β(ms), except for two cases, where I suspect it might be a missing pair of braces. Then, the whole term could simply be replaced with a p2p message time (message size).

For the model, I would prefer to use either a pure Hockney with constant α and β - or a model that describes the p2p message time as function of message size. In the latter case you question is still relevant:

For instance, we cannot predict the operation time for a message size of 30 bytes if we have not measured model parameters which send and receive 30 byte messages.

Either you have to measure all possible sizes, or you have to apply a fitting model. Incidentally - if you use linear regression, you end up with Hockney again.