Search code examples
clinuxfile-iolinux-kernelmemory-mapped-files

Mapping file into memory and writing beyong end of file


I'm experimenting with memory mapped file in Linux and have a question of what actually going on when mapping the same file from different processes and writing beyond the end of file.

I created a file with vim by hand and wrote 2 bytes in there:

$ cat test_mmap
aa

Then I wrote 2 very simple programs.

The first program maps the file and modifies the mapping without msync and munmap.

writer.c:

int main(void){
    int fd = open("/tmp/test_mmap", O_CREAT | O_RDWR, S_IRWXU);
    char *mapped_region = mmap(NULL, sysconf(_SC_PAGESIZE), PROT_WRITE, MAP_SHARED, fd, 0);
    mapped_region[0] = '0';
    mapped_region[1] = '1';
    mapped_region[2] = '2';
    mapped_region[3] = '3';
    mapped_region[4] = '4';
    mapped_region[5] = '5';
}

The second one is reading the mapping.

reader.c:

int main(void){
    int fd = open("/tmp/test_mmap", O_CREAT | O_RDWR, S_IRWXU);
    char *mapped_region = mmap(NULL, sysconf(_SC_PAGESIZE), PROT_WRITE, MAP_SHARED, fd, 0);
    printf("%c\n", mapped_region[0]);
    printf("%c\n", mapped_region[1]);
    printf("%c\n", mapped_region[2]);
    printf("%c\n", mapped_region[3]);
    printf("%c\n", mapped_region[4]);
    printf("%c\n", mapped_region[5]);
}

So I ran

$ ./writer && ./reader && cat /tmp/test/test_mmap
0
1
2
3
4
5
012

This means that any data written beyond the end of file is preserved in the mapping for some time (although it is not written out to the file) and if another process consequently maps the same region the data written beyond are not zeroed as specified in the man-page:

A file is mapped in multiples of the page size. For a file that is not a multiple of the page size, the remaining memory is zeroed when mapped, and writes to that region are not written out to the file.

Running reader with perf -e major-faults ./reader shows that

0      major-faults                                                

meaning that no pages are read from the disk. Also looking at the /proc/<pid_writer>/smaps I observed that the page is marked as dirty and private (even though the mapping was create with MAP_SHARED flag):

7fc80f279000-7fc80f27a000 -w-s 00000000 fd:00 6057290   /tmp/test_mmap
Shared_Clean:          0 kB                                                                                                                                                                                        
Shared_Dirty:          0 kB                                                                                                                                                                                        
Private_Clean:         0 kB                                                                                                                                                                                        
Private_Dirty:         4 kB                                                                                                                                                                                        

If I ran the reader process after some time (what time is required to wait?) I observed that

$ ./reader
0
1
2

Question: Is it correct and documented somewhere that if one process modifies mapping beyond the end of the file the page is marked as dirty and as long as the page is dirty and another process maps the same region of the same file the data written by the process before is not zeroed and preserved as is for a while?


Solution

  • The definitive reference in these matters is POSIX, which in its rationale section for mmap has to say:

    The mmap() function can be used to map a region of memory that is larger than the current size of the object. [... snip discussion on sending SIGBUS if possible, i.e. when accessing a page beyond the end of the file ...] written data may be lost and read data may not reflect actual data in the object.

    So, POSIX says that doing this can result in lost data. Also, the portability is questionable at best (think about no-MMU systems, the interaction with hugepages, platforms with different pagesizes...)