I am using Dask
to apply a function myfunc
that adds two new columns new_col_1
and new_col_2
to my Dask dataframe data
. This function uses two columns a1
and a2
for computing the new columns.
ddata[['new_col_1', 'new_col_2']] = ddata.map_partitions(
lambda df: df.apply((lambda row: myfunc(row['a1'], row['a2'])), axis=1,
result_type="expand")).compute()
This gives the following error:
ValueError: Metadata inference failed in `lambda`.
You have supplied a custom function and Dask is unable to determine the type of output that that function returns.
To resolve this please provide a meta= keyword.
How can I provide the meta
keyword for this scenario?
meta
can be provided via kwarg to .map_partitions
:
some_result = dask_df.map_partitions(some_func, meta=expected_df)
expected_df
could be specified manually, or alternatively you could compute it explicitly on a small sample of data (in which case it will be a pandas
dataframe).
There are more details in the docs.