Search code examples
pythonpandasdataframeparquet

Cannot Read Parquet File of Multi-Level Complex Index Data Frame


I can create a sample data frame with the following code and save it as parquet. When I try to read it throws "TypeError: unhashable type: 'numpy.ndarray'". Is it possible to save an index comprised of tuples or do I have to reset the index before saving to parquet? Thanks

import pandas as pd

# Creating sample data
data = {
    'A': [1, 2, 3],
    'B': [6, 7, 8],
    'C': [11, 12, 13],
}

# Creating multi-index
index = pd.MultiIndex.from_tuples(
    [
        ((10, 30), (0.75, 1.0)), 
        ((10, 30), (0.75, 1.25)),
        ((10, 30), (1.0, 1.25))
    ],
    names=['level_0', 'level_1']
)

# Creating DataFrame with multi-index
df = pd.DataFrame(data, index=index)

print(df)

df.to_parquet(path="test.parquet")
pd.read_parquet("test.parquet")

Solution

  • import pandas as pd
    
    # Creating sample data
    data = {
        'A': [1, 2, 3],
        'B': [6, 7, 8],
        'C': [11, 12, 13],
    }
    
    # Creating multi-index
    index = pd.MultiIndex.from_tuples(
        [
            ((10, 30), (0.75, 1.0)), 
            ((10, 30), (0.75, 1.25)),
            ((10, 30), (1.0, 1.25))
        ],
        names=['level_0', 'level_1']
    )
    
    # Creating DataFrame with multi-index
    df = pd.DataFrame(data, index=index)
    
    print(df)
    
    # Resetting the index before saving to Parquet
    df_reset = df.reset_index()
    
    # Saving to Parquet
    df_reset.to_parquet("test.parquet")
    
    # Reading from Parquet
    df_read = pd.read_parquet("test.parquet")
    
    # Setting the index back to the original MultiIndex
    df_read.set_index(['level_0', 'level_1'], inplace=True)
    
    print(df_read)