I have an interval-treeish algorithm I would like to run in parallel for many queries using threads. Problem is that then each thread would need its own array, since I cannot know in advance how many hits there will be.
There are other questions like this, and the solution suggested is always to have an array of size (K, t) where K is output length and t is number of threads. This does not work for me as K might be different for each thread and each thread might need to resize the array to fit all the results it gets.
Pseudocode:
for i in prange(len(starts)):
qs, qe, qx = starts[i], ends[i], index[i]
results = t.search(qs, qe)
if len(results) + nfound < len(output):
# add result to output
else:
# resize array
# then add results
An usual pattern is that every thread gets its own container, which is a trade-off between speed/complexity and memory-overhead:
i
-value)".After the parallel section, the data must be either collected in a final container in a post processing step (which also could happen in parallel) or the subsequent algorithms should be able to handle a collection of containers.
Here is an example using c++-vector (which already has memory management and increasing size built-in):
%%cython -+ -c=/openmp --link-args=/openmp
from cython.parallel import prange, threadid
from libcpp.vector cimport vector
cimport openmp
def calc_in_parallel(N):
cdef int i,k,tid
cdef int n = N
cdef vector[vector[int]] vecs
# every thread gets its own container
vecs.resize(openmp.omp_get_max_threads())
for i in prange(n, nogil=True):
tid = threadid()
for k in range(i):
# use container of the thread
vecs[tid].push_back(k) # dummy for calculation
return vecs
Using omp_get_max_threads()
for the number of threads will overestimate the real number of threads in many cases. It is probably more robust to set the number of threads explicitly in prange
, i.e.
...
NUM_THREADS = 2
vecs.resize(NUM_THREADS)
for i in prange(n, nogil=True, num_threads = NUM_THREADS):
...
A similar approach can be applied using pure C, but more boiler plate code (memory management) will be needed in this case.