What is the asymptotic worst case time complexity of an algorithm that involves parallel processing? If there are O(p) processes run in parallel each with a runtime of O(n), is the runtime still O(n) in theory, regardless of the size of p relative to n, or is the runtime dependent on O(p) as well?
I am looking for both the canonical method used in theory texts to define the computational complexity of programs with parallel processes, and the practical considerations, for example hardware limitations, it presents.
Most of the time, the number of processing units is not taken into account in the complexity because having p
processing units does not change the complexity since p
is generally considered as a (small) constant. For example, a basic linear search on an array runs in O(n)
time serially while it can be done in O(n/p)
times in parallel. Since not all algorithm perfectly scale, researchers tried to see what happens when p
is big or even infinite. To study the complexity of such parallel algorithms, they have designed a theoretical model of parallel machine called PRAM (it is not very realistic when p
is huge but it helps to understand the scalability of algorithms). A good example is a bitonic sort: the sequential version runs in O(n log² n)
, but the parallel version with no limit to the number of processing units runs in O(log² n)
. It is known that even with an infinite amount of processing units cannot be lower than O(log n)
(but AFAIK the best known parallel sorting algorithms runs in O(log² n)
when p is sufficiently large). This shows that a sorting algorithm does not linearly scale with the number of processing unit: it is not naively O((n log n) / p)
. Including p
in the complexity is a good idea when the algorithm is meant to be executed on massively parallel platforms (when p
is small, the complexity is nearly useless because of hidden constants). While assuming "p is sufficiently large" or even "infinite" seems unrealistic at first glance, it is far from being useless because there are massively parallel hardware existing nowadays. Supercomputers are a good example, GPU clusters are too. That being said, (expensive) data transfer, synchronizations (which does not run in constant time) and low-level hardware capabilities are often not considered in the complexity because of the theoretical model used (PRAM is too simplistic to consider them but more advanced model exists to analyse parallel algorithms).
So put it shortly, p
can be considered either as a constant or a variable in the complexity. If it is a constant, then it can simply be removed because constants does not matter in a complexity (unless really huge). The analysis of parallel algorithm (ie. considering p
) is actually a whole field of research in computer science (existing since several decades, even before multicore processors even existed).