Search code examples
pythonmultiprocesstqdm

Tqdm progress bar only shows after process end with ProcessPoolExecutor


My TQDM progress bar doesn't show during my multithreaded process, I only see it after the process is finished

Here is a way to reproduce the problem

I coded these two methods

from concurrent.futures import ProcessPoolExecutor
import sys
from colorama import Fore

def parallelize(desc, func, array, max_workers):
    with ProcessPoolExecutor(max_workers=max_workers) as executor:
        output_data = list(progress_bar(desc, list(executor.map(func,array))))
    return output_data

def progress_bar(desc, array):
    return tqdm(array,
            total=len(array),
            file=sys.stdout,
            ascii=' >',
            desc=desc,
            bar_format="%s{l_bar}%s{bar:30}%s{r_bar}" % (Fore.RESET, Fore.BLUE, Fore.RESET))

you can test it this way

from tqdm import tqdm
  
test = range(int(1e4))
   
def identity(x):
    return x

parallelize("", identity, test, 2)

It should print this (00:00) but the process takes around 3sc

100%|>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>| 10000/10000 [00:00<00:00, 3954279.25it/s]

Thanks for the help


Solution

  • I think this is cause when you call your progress bar

    output_data = list(progress_bar(desc, list(executor.map(func,array))))
    

    python first executor.map(func, array) and only then pass the results to progress_bar. It won't be the same but I can share with you a boiler plate of how to parallelize a python function.

    from joblib import Parallel, delayed
    
    def func(a):
      # Do something
    
    # Parallelize the call
    Parallel(n_jobs=-1)(delayed(func)(a) for a in tqdm(array, total=len(array))