I can create a sample data frame with the following code and save it as parquet. When I try to read it throws "TypeError: unhashable type: 'numpy.ndarray'". Is it possible to save an index comprised of tuples or do I have to reset the index before saving to parquet? Thanks
import pandas as pd
# Creating sample data
data = {
'A': [1, 2, 3],
'B': [6, 7, 8],
'C': [11, 12, 13],
}
# Creating multi-index
index = pd.MultiIndex.from_tuples(
[
((10, 30), (0.75, 1.0)),
((10, 30), (0.75, 1.25)),
((10, 30), (1.0, 1.25))
],
names=['level_0', 'level_1']
)
# Creating DataFrame with multi-index
df = pd.DataFrame(data, index=index)
print(df)
df.to_parquet(path="test.parquet")
pd.read_parquet("test.parquet")
You must specify the levels:
import pandas as pd
data = {
'A': [1, 2, 3],
'B': [6, 7, 8],
'C': [11, 12, 13],
}
index = pd.MultiIndex.from_tuples(
[
(str((10, 30)), str((0.75, 1.0))),
(str((10, 30)), str((0.75, 1.25))),
(str((10, 30)), str((1.0, 1.25)))
],
names=['level_0', 'level_1']
)
df = pd.DataFrame(data, index=index)
print("Original DataFrame:")
print(df)
df_reset = df.reset_index()
df_reset.to_parquet(path="test.parquet")
df_read = pd.read_parquet("test.parquet")
df_read.set_index(['level_0', 'level_1'], inplace=True)
print("DataFrame read from Parquet:")
print(df_read)
which returns
Original DataFrame:
A B C
level_0 level_1
(10, 30) (0.75, 1.0) 1 6 11
(0.75, 1.25) 2 7 12
(1.0, 1.25) 3 8 13
DataFrame read from Parquet:
A B C
level_0 level_1
(10, 30) (0.75, 1.0) 1 6 11
(0.75, 1.25) 2 7 12
(1.0, 1.25) 3 8 13