I would like to get the byte contents of a pandas dataframe exported as hdf5, ideally without actually saving the file (i.e., in-memory).
On python>=3.6, < 3.9
(and pandas==1.2.4
, pytables==3.6.1
) the following used to work:
import pandas as pd
with pd.HDFStore(
"in-memory-save-file",
mode="w",
driver="H5FD_CORE",
driver_core_backing_store=0,
) as store:
store.put("my_key", df, format="table")
binary_data = store._handle.get_file_image()
Where df
is the dataframe to be converted to hdf5, and the last line calls this pytables
function.
However, starting with python 3.9, I get the following error when using the snippet above:
File "tables/hdf5extension.pyx", line 523, in tables.hdf5extension.File.get_file_image
tables.exceptions.HDF5ExtError: Unable to retrieve the size of the buffer for the file image. Plese note that not all drivers provide support for image files.
The error is raised by the same pytables
function linked above, apparently due to issues while retrieving the size of the buffer for the file image. I don't understand the ultimate reason for it, though.
I have tried other alternatives such as saving to a BytesIO
file-object, so far unsuccessfully.
How can I keep the hdf5 binary of a pandas dataframe in-memory on python 3.9?
The fix was to do conda install -c conda-forge pytables
instead of pip install pytables
. I still don't understand the ultimate reason behind the error, though.