python pandas dask dask-distributed dask-delayed

Using Not Yet Implemented Pandas Functions in Dask

I believe I saw a recommendation in one of the Dask tutorials on how to use Pandas functions that are not yet implemented in the Dask framework when working with Dask dataframes, but I seem to have misplaced where I saw that. For example, I would like to use the Pandas function 'ewm'.

As a workaround, I'm converted my Dask dataframes into Pandas dataframes, running ewm over the Pandas dataframes, and then converting them back into Dask for later more memory intensive operations. Not the most efficient.

Is there a better strategy for this?

Solution

There are a variety of lower-level generic functions that you can use to build up Dask Dataframe functions like map_partitions, custom Aggregations, Rolling, and more.

There is some more information here: https://docs.dask.org/en/latest/best-practices.html#learn-techniques-for-customization