My goal is to extract a large subimage from an even larger uncompressed image (30000, 65536), without reading the whole image into memory, then save the subimage in a compressed format. At the moment, I only care about determining how well the compression worked as an indicator of image complexity; I don't need to save the image in a visible format, but I would love to. This is my first python script and I am getting stuck on the entry-limits for some function calls.
I get two related errors based on two alternate attempts (with boring lines removed):
Version 1:
fd = open(fname,'rb')
h5file=h5py.File(h5fname, "w")
data = h5file.create_dataset("data", (Height, Width), dtype='i', maxshape=(None, None)) # try i8?
data = fromfile(file=fd, dtype='h', count=Height*Width) #FAIL
fd.close()
h5file.close()
outfilez = gzip.open(outfilename,'wb')
outfilez.write(data)
outfilez.close()
Error:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:...\sitecustomize.py", line 523, in runfile
execfile(filename, namespace)
File "C:...\Script_v3.py", line 183, in <module>
data = fromfile(file=fd, dtype=datatype, count=BandHeight_tracks*Width)
ValueError: array is too big.
Version 2 (for loop to reduce fromfile usage):
fd = open(fname,'rb')
h5file=h5py.File(h5fname, "w")
data = h5file.create_dataset("data", (Height, Width), dtype='i', maxshape=(None, None)) # try i8?
for i in range(0, Height-1):
data[i:] = fromfile(file=fd, dtype='h', count=Width)
fd.close()
h5file.close()
outfilez = gzip.open(outfilename,'wb')
outfilez.write(data)
outfilez.close()
Error (I do not get this with the other version):
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:...\sitecustomize.py", line 523, in runfile
execfile(filename, namespace)
File "C:...\Script_v4.py", line 195, in <module>
outfilez.write(data)
File "C:...\gzip.py", line 235, in write
self.crc = zlib.crc32(data, self.crc) & 0xffffffffL
TypeError: must be string or read-only buffer, not Dataset
I am running code using spyder on a 64bit Win7 machine with 16GBRAM. The images are a max of 4GB.
Solution as I found out eventually:
Error 1:
Ignore. If I need to operate with this much memory, switch to C++
Error 2:
The type was not set somehow. From the terminal we see:
>>> data[0]
array([2, 2, 0, ..., 3, 4, 2])
It should be:
>>> data[0]
array([2, 2, 0, ..., 3, 4, 2], dtype=int16)
The fix is that I added the following line after the for loop:
data=int16(data)