Search code examples
daskdask-delayed

Need clarity in copying a dask.dataframe


Can pandas.DataFrame.copy API can be exactly imitated in dask.DataFrame, using the following code?

from copy import copy
df2 = copy(df)

Is it simple copy or deep copy? How can I do the other type of copy?

Or do I necessarily need to do the following?

df2 = dask.from_delayed(pandas.DataFrame.copy(df.to_delayed(),deep=True))

Will the 2nd code snippet completely solve my problem, or there are some caveats?


Solution

  • As of 2018-07-01, Dask dataframes don't support mutable operations, so copying shouldn't be necessary.