The nature of my job requires me to make a lot of PDF images of data that I am analyzing. At the end of the day I only use maybe 10% of the images as a "proof" of concept but I still want to save all of the images in case people want to scrutinize my work.
I am thinking of something like, storing the PDF files in an hdf5 file but as far as I am aware this is not possible (my only interface with hdf5 files are through the h5py module in python).
Do you guys have any recommendation?
It is possible to store (many) PDF files within an HDF5 file. One way to solve this could be to create a dataset for each PDF, and have the dataset be of data type opaque of one dimension with a size equal to the size of the PDF file. If you are not bound to a specific technology, you could solve your use-case using HDFql as follows:
# import HDFql package
import HDFql
# create an HDF5 file named 'pdf.h5' and use (i.e. open) it
HDFql.execute("CREATE AND USE FILE pdf.h5")
# get all files contained in root directory '/my_dir' which, for the sake
# of this example, contains the PDFs to store in the HDF5 file
HDFql.execute("SHOW FILE /my_dir/")
i = 0
while HDFql.cursor_next() == HDFql.SUCCESS:
# get name of PDF file
file_name = HDFql.cursor_get_char()
# create dataset containing the content of the PDF file
HDFql.execute("CREATE DATASET dset_%d VALUES FROM BINARY FILE \"/my_dir/%s\"" % (i, file_name))
i += 1