Search code examples
pythonjpegh5pyfile-conversion

TypeError: 'NoneType' object is not iterable (h5 file)


I am fighting with python for a school project. I copied and pasted some code from this post, changing only the variable names. I am trying to convert two files from h5 to jpg. Here is my code:

import h5py
import numpy as np
from PIL import Image

hdf = h5py.File("train_happy.h5",'r')
array = np.array(list(hdf.get("train_happy.h5")))
img = Image.fromarray(array.astype('uint8'), 'RGB')
img.save("train_happy.jpg", "JPEG")

hdf2 = h5py.File("test_happy.h5",'r')
array = np.array(list(hdf2.get("test_happy.h5")))
img = Image.fromarray(array.astype('uint8'), 'RGB')
img.save("test_happy.jpg", "JPEG")

training = 'train_happy.jpg'
testing = 'test_happy.jpg'

I know absolutely nothing about h5 files or converting files using python. Please help!

EDIT: Here is the line the error was on:

array = np.array(list(hdf.get("train_happy.h5")))

If I had to guess, I would say the same error will happen on this line:

array = np.array(list(hdf2.get("test_happy.h5")))

Also, the command:

print(list(hdf.keys()))

gives me this output:

['list_classes', 'train_set_x', 'train_set_y']

Solution

  • Review the example in your linked post! Your initial error is this in line:

    array = np.array(list(hdf.get("train_happy.h5")))
    

    train_happy.h5 is the name of the HDF5 file. You need to use the name of the image dataset in the HDF5 file (using group/dataset nomenclature). The output from list(hdf.keys()) indicates you have 3 nodes at the root level. Each node is either a Group or a Dataset (an image). Without knowing exactly what you have, it's hard to write the next step. Ideally you would use .isinstance() to get the node type. A very simple example is provided below to loop thru your node names:

    for node in list(hdf.keys()) :
        print ('working on node %s' % node)
        object = hdf[node]
        if (isinstance(object, h5py.Group)):
            print ('%s is a Group' % node )
        elif (isinstance(object, h5py.Dataset)):
            print ('%s is a Dataset' % node )
    

    Or, you could simply hack away and try this (adjusting downstream code appropriately): [code below modified per hpaulj's comments]

    array_x = hdf.get["train_set_x"][:]
    array_y = hdf.get["train_set_y"][:]
    

    The code above assumes that train_set_x and train_set_y are image datasets (similar to "Photos/Image 1" in your link).

    Also, you DO NOT need the second hdf2 declaration (hdf2 = h5py.File("test_happy.h5",'r')) to process a second image. You can reuse hdf each time, and change the name of the Group/Dataset reference as shown for array_y above.