Right now I'm filtering an array using
arr = [a for a in tqdm(replays) if check(a)]
However with hundreds of thousands of elements, this takes alot of time. I was wondering if it was possible to do this with multiprocessing, ideally in a nice and compact pythonic way.
Define a multiprocessing-using parallel filter function pfilter
from multiprocessing import Pool
def pfilter(filter_func, arr, cores):
with Pool(cores) as p:
booleans = p.map(filter_func, arr)
return [x for x, b in zip(arr, booleans) if b]
means that order of execution is truly independent from each other between the elements.
The usage in your case is (4 cpus):
arr = pfilter(check, tqdm(replays), 4)
For some weird reasons however, the filter_func isn't allowed to be a lambda expression or defined as one ...