Search code examples
pandaslistdataframetuplesparquet

Parquet Converts DataFrame Tuples to Lists


Parquet seems unable to save/read a DataFrame containing a tuple. The tuple becomes a list. Is this by design or a bug? lists and dicts are restored as expected. Pickle will save/read a tuple as expected. The example below saves a dataframe consisting of a single tuple. When read back, it's a list.

import pandas as pd
df = pd.DataFrame([[(0,1)]], columns=['tuple'])
print(df)
df.to_parquet('t')
df2 = pd.read_parquet('t', engine='pyarrow')
print(df2)

Solution

  • I have used parquet files for some time now but for some reasons I didnt have a df with tuples.

    From this documentation, tuples are not supported as a parquet dtype.

    As I understand it from this document, tuples in a parquet file are resolved as lists.

    enter image description here

    I tested that with the following (I think, thats what you experienced as well). At the time of saving df in the snip below, column1 is a tuple

    enter image description here

    When I read though, I get the column1 as a list

    enter image description here