I am fighting with python for a school project. I copied and pasted some code from this post, changing only the variable names. I am trying to convert two files from h5 to jpg. Here is my code:
import h5py
import numpy as np
from PIL import Image
hdf = h5py.File("train_happy.h5",'r')
array = np.array(list(hdf.get("train_happy.h5")))
img = Image.fromarray(array.astype('uint8'), 'RGB')
img.save("train_happy.jpg", "JPEG")
hdf2 = h5py.File("test_happy.h5",'r')
array = np.array(list(hdf2.get("test_happy.h5")))
img = Image.fromarray(array.astype('uint8'), 'RGB')
img.save("test_happy.jpg", "JPEG")
training = 'train_happy.jpg'
testing = 'test_happy.jpg'
I know absolutely nothing about h5 files or converting files using python. Please help!
EDIT: Here is the line the error was on:
array = np.array(list(hdf.get("train_happy.h5")))
If I had to guess, I would say the same error will happen on this line:
array = np.array(list(hdf2.get("test_happy.h5")))
Also, the command:
print(list(hdf.keys()))
gives me this output:
['list_classes', 'train_set_x', 'train_set_y']
Review the example in your linked post! Your initial error is this in line:
array = np.array(list(hdf.get("train_happy.h5")))
train_happy.h5
is the name of the HDF5 file. You need to use the name of the image dataset in the HDF5 file (using group/dataset nomenclature). The output from list(hdf.keys())
indicates you have 3 nodes at the root level. Each node is either a Group or a Dataset (an image). Without knowing exactly what you have, it's hard to write the next step. Ideally you would use .isinstance()
to get the node type. A very simple example is provided below to loop thru your node names:
for node in list(hdf.keys()) :
print ('working on node %s' % node)
object = hdf[node]
if (isinstance(object, h5py.Group)):
print ('%s is a Group' % node )
elif (isinstance(object, h5py.Dataset)):
print ('%s is a Dataset' % node )
Or, you could simply hack away and try this (adjusting downstream code appropriately): [code below modified per hpaulj's comments]
array_x = hdf.get["train_set_x"][:]
array_y = hdf.get["train_set_y"][:]
The code above assumes that train_set_x
and train_set_y
are image datasets (similar to "Photos/Image 1"
in your link).
Also, you DO NOT need the second hdf2 declaration (hdf2 = h5py.File("test_happy.h5",'r')
) to process a second image. You can reuse hdf
each time, and change the name of the Group/Dataset reference as shown for array_y
above.