Search code examples
cmemory-managementmmap

mmap of files can be done in any other way except page level mapping?


recently during one of interviews, I was asked if mmap can do mapping of the program in binary mode directly; without page reference.

I think that is not possible, as it

allows an application to map a file into memory, meaning that there is a one-to-one correspondence between a memory address and a word in the file. The programmer can then access the file directly through memory, identically to any other chunk of memory-resident data—it is even possible to allow writes to the memory region to transparently map back to the file on disk

accessing a file without paging being involved sounds wrong.

still I want to know if there is any way mmap can map file in memory in any other way than the page way.

=====
the page way
=====

The page is the smallest unit of memory that can have distinct permissions and behavior. Consequently, the page is the building block of memory mappings, which in turn are the building blocks of the process address space. The mmap( ) system call operates on pages. Both the addr and offsetparameters must be aligned on a page-sized boundary. That is, they must be integer multiples of the page size.

Mappings are, therefore, integer multiples of pages. If the len parameter provided by the caller is not aligned on a page boundary—perhaps because the underlying file's size is not a multiple of the page size—the mapping is rounded up to the next full page


Solution

  • All memory maps involve page level mapping, if we use the Wikipedia definition of page:

    A page, memory page, or virtual page is a fixed-length contiguous block of virtual memory, described by a single entry in the page table. It is the smallest unit of data for memory management in a virtual memory operating system.

    As described in the man 2 mmap man page,

    mmap() creates a new mapping in the virtual address space of the calling process.

    The mapping is defined by entries in the page table.

    So, essentially, mmap() is a tool for managing virtual memory in the page level.


    The interviewer was probably trying to find out whether you understood the difference between low-level I/O (read(), write()) and file-backed memory mapping behaviour.

    If you open a file using the O_DIRECT flag, the kernel tries to transfer the data directly to the userspace buffers, bypassing the page cache.

    Because of how memory maps work, having the backing file open()ed with or without the O_DIRECT flag has no effect on the memory map.

    (The MAP_SHARED/MAP_PRIVATE flag has an effect whether the memory used for the accessed parts of the mapping stay in the page cache or not. Typically, the Linux kernel uses a copy-on-write approach: the pages stay read-only in the page cache until the first write access. At that point, private mappings get copied to new pages (or evicted), and the shared mapping is marked read-write. It is a bit complicated, but it is quite efficient. However, all of this too relies on virtual memory paging.)

    It is even possible to construct a memory mapping with no backing at all (a PROT_NONE mapping). Any access to the mapping causes the kernel to generate a SIGBUS signal (to the thread attempting the access), which can be caught by the process. The signal handler can decode and skip the instruction, thus emulating memory accesses. It can even read one or more bytes from the file using O_DIRECT. Again, the mapping is based on virtual memory, and thus pages; there is just no RAM used for the mapping, and instead all accesses are emulated. This is rarely used, because it is unimaginably slow.