Recently I stumbled upon http://dask.pydata.org/en/latest/ As I have some pandas code which only runs on a single core I wonder how to make use of my other CPU cores. Would dask work well to use all (local) CPU cores? If yes how compatible is it to pandas?
Could I use multiple CPUs with pandas? So far I read about releasing the GIL but that all seems rather complicated.
Would dask work well to use all (local) CPU cores?
Yes.
how compatible is it to pandas?
Pretty compatible. Not 100%. You can mix in Pandas and NumPy and even pure Python stuff with Dask if needed.
Could I use multiple CPUs with pandas?
You could. The easiest way would be to use multiprocessing
and keep your data separate--have each job independently read from disk and write to disk if you can do so efficiently. A significantly harder way is using mpi4py
which is most useful if you have a multi-computer environment with a professional administrator.