I'm trying to apply a dask-ml QuantileTransformer
transformation to a percentage
field, and create a new field percentage_qt
in the same dataframe. But I get the error Array assignment only supports 1-D arrays
. How to make this work?
import pandas as pd
import dask.dataframe as dd
from dask_ml.preprocessing import QuantileTransformer
mydict = [{'percentage': 12.1, 'b': 2, 'c': 3, 'd': 4},
{'percentage': 10.2, 'b': 200, 'c': 300, 'd': 400},
{'percentage': 11.3, 'b': 2000, 'c': 3000, 'd': 4000 }]
df = pd.DataFrame(mydict)
ddf = dd.from_pandas(df, npartitions=10)
qt = QuantileTransformer(n_quantiles=100)
x = ddf[['percentage']]
y = qt.fit_transform(x)
ddf['percentage_qt'] = y # <-- error happens here
The error you get is the following
ValueError: Array assignment only supports 1-D arrays
A y
is not an array. You could use this trick
Transform y
to dask dataframe using the same indices as ddf
dfy = y.to_dask_dataframe(
columns=['percentage_qt'],
index=ddf.index)
For some strange reason concat on 0 axis doesn't work (maybe we should open an issue on GH) so we can join the two dataframes as
ddf_out = ddf.join(dfy)
Which returns the expected output
print(ddf_out.compute())
percentage b c d percentage_qt
0 12.1 2 3 4 1.000000
1 10.2 200 300 400 0.000000
2 11.3 2000 3000 4000 0.656772