Search code examples
pythonpandashdf5pytables

HDF5 file grows in size after overwriting the pandas dataframe


I'm trying to overwrite the pandas dataframe in hdf5 file. Each time I do this, the file size grows up while the stored frame content is the same. If I use mode='w' I lost all other records. Is this a bug or am I missing something?

import pandas
df = pandas.read_csv('1.csv')
for i in range(100):
  store = pandas.HDFStore('tmp.h5')
  store.put('TMP', df)
  store.close()

The tmp.h5 grows in size.


Solution

  • Read the big warning at the bottom of this section

    This is how HDF5 works.