Search code examples
pythonpandashdf5pytables

Unable to save DataFrame to HDF5 ("object header message is too large")


I have a DataFrame in Pandas:

In [7]: my_df
Out[7]: 
<class 'pandas.core.frame.DataFrame'>
Int64Index: 34 entries, 0 to 0
Columns: 2661 entries, airplane to zoo
dtypes: float64(2659), object(2)

When I try to save this to disk:

store = pd.HDFStore(p_full_h5)
store.append('my_df', my_df)

I get:

  File "H5A.c", line 254, in H5Acreate2
    unable to create attribute
  File "H5A.c", line 503, in H5A_create
    unable to create attribute in object header
  File "H5Oattribute.c", line 347, in H5O_attr_create
    unable to create new attribute in header
  File "H5Omessage.c", line 224, in H5O_msg_append_real
    unable to create new message
  File "H5Omessage.c", line 1945, in H5O_msg_alloc
    unable to allocate space for message
  File "H5Oalloc.c", line 1142, in H5O_alloc
    object header message is too large

End of HDF5 error back trace

Can't set attribute 'non_index_axes' in node:
 /my_df(Group) u''.

Why?

Note: In case it matters, the DataFrame column names are simple small strings:

In[12]: max([len(x) for x in list(my_df.columns)])
Out{12]: 47

This is all with Pandas 0.11 and the latest stable version of IPython, Python and HDF5.


Solution

  • HDF5 has a header limit of 64kb for all metadata of the columns. This include name, types, etc. When you go about roughly 2000 columns, you will run out of space to store all the metadata. This is a fundamental limitation of pytables. I don't think they will make workarounds on their side any time soon. You will either have to split the table up or choose another storage format.