I'm running the following function for an ML model.
def get_images(filename):
bin_file = open(filename, 'rb')
buf = bin_file.read() # all the file are put into memory
bin_file.close() # release the measure of operating system
index = 0
magic, num_images, num_rows, num_colums = struct.unpack_from(big_endian + four_bytes, buf, index)
index += struct.calcsize(big_endian + four_bytes)
images = [] # temp images as tuple
for x in range(num_images):
im = struct.unpack_from(big_endian + picture_bytes, buf, index)
index += struct.calcsize(big_endian + picture_bytes)
im = list(im)
for i in range(len(im)):
if im[i] > 1:
im[i] = 1
However, I am receiving an error at the line:
im = struct.unpack_from(big_endian + picture_bytes, buf, index)
With the error:
error: unpack_from requires a buffer of at least 784 bytes
I have noticed this error is only occurring at certain iterations. I cannot figure out why this is might be the case. The dataset is a standard MNIST dataset which is freely available online.
I have also looked through similar questions on SO (e.g. error: unpack_from requires a buffer) but they don't seem to resolve the issue.
You didn't include the struct formats in your mre so it is hard to say why you are getting the error. Either you are using a partial/corrupted file or your struct formats are wrong.
This answer uses the test file 't10k-images-idx3-ubyte.gz'
and file formats found at http://yann.lecun.com/exdb/mnist/
Open the file and read it into a bytes object (gzip is used because of the file's type).
import gzip,struct
with gzip.open(r'my\path\t10k-images-idx3-ubyte.gz','rb') as f:
data = bytes(f.read())
print(len(data))
The file format spec says the header is 16 bytes (four 32 bit ints) - separate it from the pixels with a slice then unpack it
hdr,pixels = data[:16],data[16:]
magic, num_images, num_rows, num_cols = struct.unpack(">4L",hdr)
# print(len(hdr),len(pixels))
# print(magic, num_images, num_rows, num_cols)
There are a number of ways to iterate over the individual images.
img_size = num_rows * num_cols
imgfmt = "B"*img_size
for i in range(num_images):
start = i * img_size
end = start + img_size
img = pixels[start:end]
img = struct.unpack(imgfmt,img)
# do work on the img
Or...
imgfmt = "B"*img_size
for img in struct.iter_unpack(imgfmt, pixels):
img = [p if p == 0 else 1 for p in img]
The itertools grouper recipe would probably also work.