The python documentation for the numpy.savez
which saves an .npz
file is:
The .npz file format is a zipped archive of files named after the variables they contain. The archive is not compressed and each file in the archive contains one variable in .npy format. [...]
When opening the saved .npz file with load a NpzFile object is returned. This is a dictionary-like object which can be queried for its list of arrays (with the .files attribute), and for the arrays themselves.
My question is: what is the point of numpy.savez
?
Is it just a more elegant version (shorter command) to save multiple arrays, or is there a speed-up in the saving/reading process? Does it occupy less memory?
There are two parts of explanation for answering your question.
As we already read from the doc, the .npy
format is:
the standard binary file format in NumPy for persisting a single arbitrary NumPy array on disk. ... The format is designed to be as simple as possible while achieving its limited goals. (sources)
And .npz
is only a
simple way to combine multiple arrays into a single file, one can use ZipFile to contain multiple “
.npy
” files. We recommend using the file extension “.npz
” for these archives. (sources)
So, .npz
is just a ZipFile containing multiple “.npy
” files. And this ZipFile can be either compressed (by using np.savez_compressed
) or uncompressed (by using np.savez
).
It's similar to tarball archive file in Unix-like system, where a tarball file can be just an uncompressed archive file which containing other files or a compressed archive file by combining with various compression programs (gzip
, bzip2
, etc.)
And Numpy also provides different APIs to produce these binary file output:
np.save
---> Save an array to a binary file in NumPy .npy
formatnp.savez
--> Save several arrays into a single file in uncompressed .npz
formatnp.savez_compressed
--> Save several arrays into a single file in compressed .npz
formatnp.load
--> Load arrays or pickled objects from .npy
, .npz
or pickled filesIf we skim the source code of Numpy, under the hood:
def _savez(file, args, kwds, compress, allow_pickle=True, pickle_kwargs=None):
...
if compress:
compression = zipfile.ZIP_DEFLATED
else:
compression = zipfile.ZIP_STORED
...
def savez(file, *args, **kwds):
_savez(file, args, kwds, False)
def savez_compressed(file, *args, **kwds):
_savez(file, args, kwds, True)
Then back to the question:
np.save
, there is no more compression on top of the .npy
format, only just a single archive file for the convenience of managing multiple related files.np.savez_compressed
, then of course less memory on disk because of more CPU time to do the compression job (i.e. a bit slower).