I have a case where I would like to open a compressed numpy file using mmap mode, but can't seem to find any documentation about how it will work under the covers. For example, will it decompress the archive in memory and then mmap it? Will it decompress on the fly?
The documentation is absent for that configuration.
The short answer, based on looking at the code, is that archiving and compression, whether using np.savez
or gzip
, is not compatible with accessing files in mmap_mode
. It's not just a matter of how it is done, but whether it can be done at all.
Relevant bits in the np.load
function
elif isinstance(file, gzip.GzipFile):
fid = seek_gzip_factory(file)
...
if magic.startswith(_ZIP_PREFIX):
# zip-file (assume .npz)
# Transfer file ownership to NpzFile
tmp = own_fid
own_fid = False
return NpzFile(fid, own_fid=tmp)
...
if mmap_mode:
return format.open_memmap(file, mode=mmap_mode)
Look at np.lib.npyio.NpzFile
. An npz
file is a ZIP archive of .npy
files. It loads a dictionary(like) object, and only loads the individual variables (arrays) when you access them (e.g. obj[key]). There's no provision in its code for opening those individual files in
mmap_mode`.
It's pretty obvious that a file created with np.savez
cannot be accessed as mmap. The ZIP archiving and compression is not the same as the gzip compression addressed earlier in the np.load
.
But what of a single array saved with np.save
and then gzipped
? Note that format.open_memmap
is called with file
, not fid
(which might be a gzip file).
More details on open_memmap
in np.lib.npyio.format
. Its first test is that file
must be a string, not an existing file fid. It ends up delegating the work to np.memmap
. I don't see any provision in that function for gzip
.