How to save a pandas dataframe when a column contains sets

When trying to save a pandas dataframe where a column contains set (see example below)

import pandas as pd

df = pd.DataFrame({"col_set": [{"A", "B", "C"}, {"D", "E", "F"}]})
df.to_parquet("df_w_col_set.parquet")

The following error is thrown:

ArrowInvalid: ("Could not convert {'C', 'B', 'A'} with type set: did not recognize Python value type when inferring an Arrow data type", 'Conversion failed for column col_set with type object')

How can one save this kind of dataframe and avoid the error above?

Some semi related posts mention providing a yarrow schema but I'm not clear on what type to use when consulting pyarrow datatypes.

Code was run with python 3.7.4, pandas==1.3.0 and pyarrow==3.0.0

Mainly looking for a solution where upgrades are not needed or really minimized(to avoid breaking other dependencies).

Solution

As workaround, you can convert your set to string and use ast.literal_eval to evaluate your string as set:

import ast

df.astype({'col_set': str}).to_parquet('data.parquet')
df1 = pd.read_parquet('data.parquet') \
        .assign(col_set=lambda x: x['col_set'].map(ast.literal_eval))
print(df1)

# Output
     col_set
0  {C, B, A}
1  {F, E, D}

Or you can convert your set to tuple (or list) then revert to set:

df.assign(col_set=df['col_set'].map(tuple)).to_parquet('test.parquet')
df1 = pd.read_parquet('test.parquet') \
        .assign(col_set=lambda x: x['col_set'].map(set))
print(df1)

# Output
     col_set
0  {C, B, A}
1  {F, E, D}

You can also use pickle.dumps and pickle.loads to serialize your set:

import pickle

df.assign(col_set=df['col_set'].map(pickle.dumps)).to_parquet('test.parquet')
df1 = pd.read_parquet('test.parquet') \
        .assign(col_set=lambda x: x['col_set'].map(pickle.loads))
print(df1)

# Output
     col_set
0  {C, B, A}
1  {F, E, D}

In fact, you can choose any (un)serialization method (except JSON because set does not exist).