Search code examples
pythonnumpyuploadnumpy-memmap

Numpy load part of *.npz file in mmap_mode


I know there already exists a similar question, which has not been answered.

I have a very large numpy array saved in a npz file. I don't want it to be loaded completely (my RAM can't handle it entirely), but just want to load a part of it.

This is how the file was generated:

np.savez_compressed('file_name.npz', xxx)

And this is how I would like to load it:

xxx = np.load('file_name.npz,mmap_mode="r")

Now, to actually access the part of the array I am interested into, I should type

a = xxx['arr_0'][0][0][0]

But though this piece is quite small, python first loads the whole array (I know it because my RAM is filled) and then shows this small part. The same would happen if I directly wrote

xxx = np.load('file_name.npz,mmap_mode="r")['arr_0'][0][0][0]

What am I doing wrong?


Solution

  • mmap_mode does not work with a npz file. An npz is a zip archive. That is, it contains npy files, one per key. You can see this by looking at the npz file with a OS archive manager tool.

    I'm a little surprised that your load call doesn't raise an error, but looking at the code I see that it dispatches to NpzFile loader without even looking at the mmap_mode parameter.

    To use mmap, you'll have to extract arr_0.npy (again using the OS tool), and use load on it.