While using Pandas, I can add lists as elements without issues, as in
import pandas as pd
A = {"cls": "A"}
B = {"cls": "B"}
C = {"cls": ["A", "B"]}
df = pd.DataFrame([A,B,C])
type(df.iloc[2]["cls"]) # Returns `list`
But cudf.DataFrame
do not accept a List. As we can see here:
import cudf
cu_df = cudf.DataFrame([A, B, C])
Fails with ArrowTypeError: Expected bytes, got a 'list' object
We can see if we do not add C
, it work.
import cudf
cu_df = cudf.DataFrame([A, B])
(no error)
Trying to convert from a regular pandas dataframe, also do not works
cu_df = cudf.DataFrame(df)
(fails with the same ArrowTypeError
)
Any ideas in how to circumvent this?
After reading some documentation and this GitHub issue, it says
list operations are somewhat limited, and a column of lists can't be treated the same as a column of ndarrays in Pandas.
Thus, you might try to convert the list
into string
:
A = {"cls": "A"}
B = {"cls": "B"}
C = {"cls": str(["A", "B"])}
and use it in cudf
:
df = pd.DataFrame([A, B, C])
cu_df = cudf.DataFrame(df)
if that does not help, as mentioned on same issue:
explode each list column into a flat column, perform the binary operation, then construct a list column back from the result