Search code examples
pythonpandascpumulticoredask

move from pandas to dask to utilize all local cpu cores


Recently I stumbled upon http://dask.pydata.org/en/latest/ As I have some pandas code which only runs on a single core I wonder how to make use of my other CPU cores. Would dask work well to use all (local) CPU cores? If yes how compatible is it to pandas?

Could I use multiple CPUs with pandas? So far I read about releasing the GIL but that all seems rather complicated.


Solution

  • Would dask work well to use all (local) CPU cores?

    Yes.

    how compatible is it to pandas?

    Pretty compatible. Not 100%. You can mix in Pandas and NumPy and even pure Python stuff with Dask if needed.

    Could I use multiple CPUs with pandas?

    You could. The easiest way would be to use multiprocessing and keep your data separate--have each job independently read from disk and write to disk if you can do so efficiently. A significantly harder way is using mpi4py which is most useful if you have a multi-computer environment with a professional administrator.