Search code examples
arrayspython-3.xnumpypytables

Pytables Value Error (rank of the appended object and "..."EArray differ)


I am trying to use pytables to store my images dataset. I am using Earray to append each image as it is read. The dimensions of my Earray and image are same(except for the first, along which appending is done). I am using the following code:

atom = Atom.from_dtype(np.dtype(np.uint32,(278,278,1)))
i=0
for <read each image from folder using os into img>:
    im = cv2.imread(img.path,0)
    im = np.expand_dims(im,2) #this is because keras needs 3d images and grayscale images are 2d
    if not i:
        X = data.create_earray(dataGroup,"X",atom,(0,)+im.shape,chunkshape=(20,20,20,1))
   X.append(np.expand_dims(im,0)) #as appending require same dim.
   i=1

But still when I run the code, it gives my ValueError saying the my object rank is 1 and X rank is 4. How is that possible when I am assigning X size using im. I even tried printing shape of im, it gives (278,278,1). So, what is the problem? I am using Pytables for first time, so dont know them in depth.


Solution

  • Adding a second answer with a more complicated write method plus an EArray.read example. Frankly, I prefer my simpler method (above) to create the EArray with obj= defined, and let Pytables handle the data structures. However, if you prefer to manage this yourself, see example 2 (below). Key items to note:

    • Atom definition has 4 dimensions, with 0 axis set to zero (defines
      the direction that will be extended).
    • im = np.expand_dims(im,0) is done until AFTER im.shape is referenced in the definition of the EArray shape at creation.

    [UPDATED CODE BELOW]

    import tables as tb, numpy as np
    data = tb.open_file("image_data1.h5", mode='w')
    dataGroup = data.create_group(data.root, 'MyData')
    MyAtom = tb.Atom.from_dtype(np.dtype(np.uint32,(0,278,278,1)))
    
    im = np.arange(278*278).reshape((278,278))
    im = np.expand_dims(im,2)
    
    X = data.create_earray(dataGroup,"X", MyAtom, (0,)+im.shape)
    
    im = np.expand_dims(im,0)
    X.append( im )
    
    print ('flavor =', X.flavor )
    print ('dim=', X.ndim, ', rows = ', X.nrows)
    
    im = np.arange(278*278,278*278+278*278).reshape((278,278))
    im = np.expand_dims(im,2)
    im = np.expand_dims(im,0)
    
    X.append( im )
    
    print ('dim=', X.ndim, ', rows = ', X.nrows)
    
    data.close()
    

    Here are the lines you need to read the data from EArray X (with a couple of print statements to verify values in the corners). This should work so long as the EArray flavor is Numpy (as it is in my example). You can also use the out= parameter to specify a NumPy array to receive the output data. There are other methods to access EArray data, including .iterrows() to iterate, and .__getitem__() to slice with fancy indexing. Read the Pytables documentation if you want to do any of these.

    Y_1 = X.read( 0 )
    print (Y_1[0,0,0])
    print (Y_1[-1,-1,-1])
    
    Y_2 = X.read( 1 )
    print (Y_2[0,0,0])
    print (Y_2[-1,-1,-1])