Search code examples
pythonnumpynumpy-memmap

Is it possible to close a memmap'd temporary file without flushing its contents?


Use Case: Enormous image processing. I employ mem-mapped temporary files when the intermeditate dataset exceeds physical memory. I have no need to store intermediate results to disk after I'm done with them. When I delete them, numpy seems to flush all their contents to disk first, then remove the file from the file system. The flushing is taxing the I/O resources and file system which, to my understanding, is logically unnecessary given the file is just deleted afterwards.

Is it possible to close a memmap'd temporary file without flushing its contents?


Solution

  • You need to open your memory map as copy-on-write, with the c mode. From the numpy.memmap documentation:

    mode : {'r+', 'r', 'w+', 'c'}, optional
    

    The file is opened in this mode:

    'r'     Open existing file for reading only.
    'r+'    Open existing file for reading and writing.
    'w+'    Create or overwrite existing file for reading and writing.
    'c'     Copy-on-write: assignments affect data in memory, but changes 
            are not saved to disk. The file on disk is read-only.
    

    Default is 'r+'.

    So the default is to allow for reading and writing, but altering a memory-mapped file in this manner will indeed cause all changes to written back. Flushing the changes can happen at any time, but a flush certainly will take place when you close it.

    When you use c as the mode, changes will cause the changed page to be copied (transparently), and pages thus affected are discarded again when you close the file.

    Note that when you write to enough pages, the OS will have to swap memory pages to disk. This is no different from any other process using more memory than is available. When you close the mmapped file, any such copied pages (swapped to disk or still in memory) are discarded again.