Search code examples
pythonprogress-bardistributeddask

Dask ProgressBar doesn't work with distributed backend


The progress bar works beautifully when used with the multiprocessing backend but doesn't seem to work at all when using a distributed scheduler as the backend.

Is there a way around this? Or another solution? The distributed package has some progress bars itself but they all require a list of futures to work.


Solution

  • The key difference is that with multi threading/processing, the results are piped back to the control thread, but with distributed, they are calculated asynchronously on the cluster (even if that's on your local machine). If you previously had code like

    with ProgressBar():
        out = collection.compute()
    

    Now you can do

    from dask.distributed import progress
    out = c.compute(collection)   # c is the client
    progress(out)
    

    and to collect your result: out.result() or c.gather(out)

    Note that the distributed scheduler also makes a graphical dashboard available at http://yourhost:8787 , e.g., see under status/. There you can see your tasks getting executed without having to invoke a progress bar at all.