Search code examples
pythonsize

Python: fromfile array too big and/or gzip will not write


My goal is to extract a large subimage from an even larger uncompressed image (30000, 65536), without reading the whole image into memory, then save the subimage in a compressed format. At the moment, I only care about determining how well the compression worked as an indicator of image complexity; I don't need to save the image in a visible format, but I would love to. This is my first python script and I am getting stuck on the entry-limits for some function calls.

I get two related errors based on two alternate attempts (with boring lines removed):

Version 1:

fd = open(fname,'rb')
h5file=h5py.File(h5fname, "w")
data = h5file.create_dataset("data", (Height, Width), dtype='i', maxshape=(None, None)) # try i8?
data = fromfile(file=fd, dtype='h', count=Height*Width)  #FAIL
fd.close()
h5file.close()
outfilez = gzip.open(outfilename,'wb')
outfilez.write(data)
outfilez.close()

Error:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:...\sitecustomize.py", line 523, in runfile
    execfile(filename, namespace)
  File "C:...\Script_v3.py", line 183, in <module>
    data = fromfile(file=fd, dtype=datatype, count=BandHeight_tracks*Width)
ValueError: array is too big.

Version 2 (for loop to reduce fromfile usage):

fd = open(fname,'rb')
h5file=h5py.File(h5fname, "w")
data = h5file.create_dataset("data", (Height, Width), dtype='i', maxshape=(None, None)) # try i8?
for i in range(0, Height-1):
    data[i:] = fromfile(file=fd, dtype='h', count=Width)
fd.close()
h5file.close()
outfilez = gzip.open(outfilename,'wb')
outfilez.write(data)
outfilez.close()

Error (I do not get this with the other version):

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:...\sitecustomize.py", line 523, in runfile
    execfile(filename, namespace)
  File "C:...\Script_v4.py", line 195, in <module>
    outfilez.write(data)
  File "C:...\gzip.py", line 235, in write
    self.crc = zlib.crc32(data, self.crc) & 0xffffffffL
TypeError: must be string or read-only buffer, not Dataset

I am running code using spyder on a 64bit Win7 machine with 16GBRAM. The images are a max of 4GB.


Solution

  • Solution as I found out eventually:

    Error 1:

    Ignore. If I need to operate with this much memory, switch to C++

    Error 2:

    The type was not set somehow. From the terminal we see:

    >>> data[0]
    array([2, 2, 0, ..., 3, 4, 2])
    

    It should be:

    >>> data[0]
    array([2, 2, 0, ..., 3, 4, 2], dtype=int16)
    

    The fix is that I added the following line after the for loop:

    data=int16(data)