Search code examples
pythonnumpymemory-mapped-filesjoblib

load np.memmap without knowing shape


Is it possible to load a numpy.memmap without knowing the shape and still recover the shape of the data?

data = np.arange(12, dtype='float32')
data.resize((3,4))
fp = np.memmap(filename, dtype='float32', mode='w+', shape=(3,4))
fp[:] = data[:]
del fp
newfp = np.memmap(filename, dtype='float32', mode='r', shape=(3,4))

In the last line, I want to be able not to specify the shape and still get the variable newfp to have the shape (3,4), just like it would happen with joblib.load. Is this possible? Thanks.


Solution

  • Not unless that information has been explicitly stored in the file somewhere. As far as np.memmap is concerned, the file is just a flat buffer.

    I would recommend using np.save to persist numpy arrays, since this also preserves the metadata specifying their dimensions, dtypes etc. You can also load an .npy file as a memmap by passing the memmap_mode= parameter to np.load.

    joblib.dump uses a combination of pickling to store generic Python objects and np.save to store numpy arrays.


    To initialize an empty memory-mapped array backed by a .npy file you can use numpy.lib.format.open_memmap:

    import numpy as np
    from numpy.lib.format import open_memmap
    
    # initialize an empty 10TB memory-mapped array
    x = open_memmap('/tmp/bigarray.npy', mode='w+', dtype=np.ubyte, shape=(10**13,))
    

    You might be surprised by the fact that this succeeds even if the array is larger than the total available disk space (my laptop only has a 500GB SSD, but I just created a 10TB memmap). This is possible because the file that's created is sparse.

    Credit for discovering open_memmap should go to kiyo's previous answer here.