Search code examples
python-3.xpandasdataframeparquetfastparquet

Why index name always appears in the parquet file created with pandas?


I am trying to create a parquet using pandas dataframe, and even though I delete the index of the file, it is still appearing when I am re-reading the parquet file. Can anyone help me with this? I want index.name to be set as None.

>>> df = pd.DataFrame({'key': 1}, index=[0])
>>> df
  key
0    1
>>> df.to_parquet('test.parquet')
>>> df = pd.read_parquet('test.parquet')
>>> df
     key
index     
0        1
>>> del df.index.name
>>> df
     key
0    1
>>> df.to_parquet('test.parquet')
>>> df = pd.read_parquet('test.parquet')
>>> df
     key
index     
0        1

Solution

  • It works as expected using pyarrow:

    >>> df = pd.DataFrame({'key': 1}, index=[0])
    >>> df.to_parquet('test.parquet', engine='fastparquet')
    >>> df = pd.read_parquet('test.parquet')
    >>> del df.index.name
    >>> df
       key
    0    1
    >>> df.to_parquet('test.parquet', engine='fastparquet')
    >>> df = pd.read_parquet('test.parquet')
    >>> df
           key
    index     
    0        1 ---> INDEX NAME APPEARS EVEN AFTER DELETING USING fastparquet
    >>> del df.index.name
    >>> df.to_parquet('test.parquet', engine='pyarrow')
    >>> df = pd.read_parquet('test.parquet')
    >>> df
       key
    0    1 --> INDEX NAME IS NONE WHEN CONVERSION IS DONE WITH pyarrow