Search code examples
pythonnumpy-memmap

Edit existing .npy file imported using memmap


I am new to working with numpy.core.memmap objects and am having trouble figuring out how I edit an existing .npy file read into python using numpy.memmap(). For example, following the example from Scipy.org, I can create an object and write to it, but once created, I cannot modify the contents.

from tempfile import mkdtemp
import os.path as path

data = np.arange(12, dtype='float32')
data.resize((3,4))

filename = path.join(mkdtemp(), 'newfile.dat')
fp = np.memmap(filename, dtype='float32', mode='w+', shape=(3,4))
fp[:] = data[:] ### write data to fp array

del fp ### remove fp object

fpc = np.memmap(filename, dtype='float32', mode='c', shape=(3,4)) ### This is writeable in memory

fpc[0,:] = 0

del fpc ### close object

This simply deletes the object from memory, but the object at filename is not modified. I have tried numpy.memmap.flush(fpc) as well, but this doesn't seem to work either.

I understand from reading other posts that one can simply copy the edited .npy file to another location, but this seems like it could become problematic in terms of disk space. Is it correct that you cannot modify an existing .npy file?


Solution

  • Numpy interprets "copy on write" as "write changes to ram, but don't save them to disk" (docs). This is a fairly standard implementation when referring to data that could be shared between threads or processes. It sounds like you confused copy on write with snapshots (which sometimes use similar terminology, but refer to disk writes rather than ram).

    If you change mode="c" to mode="r+" (or eliminate the mode keyword as "r+" is the default anyway), this should solve your problem.

    Additionally I would like to point out that in most cases it is simpler and more pythonic to use np.save and np.load and simply specify the mmap_mode keyword with the correct mode when loading the file. While technically limiting flexibility, this eliminates the need to specify a few keywords making things a bit more concise.