Search code examples
linuxfilekernelmmaphuge-pages

using O_TMPFILE to clean up huge pages... or other methods?


My program is using huge pages. For doing, it open files as follows:

oflags = O_RDWR | O_CREAT | O_TRUNC;
fd = open(filename, oflag, S_IRUSR | S_IWUSR | S_IRGRP | S_IROTH);

Where filename is in the hugetlb file system. That works. My program can then mmap() the created file descriptors. But if my program gets killed, the files remain... and in the huge page filesystem, remaining files is blocked memory, as shown by the following command (876 != 1024):

cat /proc/meminfo  | grep Huge

AnonHugePages:    741376 kB
HugePages_Total:    1024
HugePages_Free:      876
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB

So, as my program is not sharing the file to anyone else, it made sense to me to create temporary files using the O_TMPFILE flag. So I tried:

oflags = O_RDWR | O_TMPFILE;
fd = open(pathname, oflag, S_IRUSR | S_IWUSR | S_IRGRP | S_IROTH);

where pathname is the hugetlbfs moint point. That fails (for a reason I cannot explain) with the following error:

open failed for /dev/hugepages: Operation not supported

Why? and more to the point: How can I guarantee that all huge pages my program is using get freed?

Yes: I could catch some signals (e.g. SIGTERM); but not all (SIGKILL)

Yes: I could unlink() the file as soon as possible using the first approach, but what if SIGKILL is received between open() and unlink().

Kernels like guaranties. So do I. What is the proper methods to guarantees 100% cleanup regardless on when or how my program terminates.


Solution

  • Looks like O_TMPFILE is not implemented yet for hugetlbfs; indeed, this option requires support of the underlying file-system:

    O_TMPFILE requires support by the underlying filesystem; only a subset of Linux filesystems provide that support. In the initial implementation, support was provided in the ex2, ext3, ext4, UDF, Minix, and shmem filesystems. XFS support was added in Linux 3.15.

    This is confirmed by looking at the kernel source code where there's no inode_ops->tmpfile() implementation in hugetlbfs.

    I believe that the right answer here is to work on this implementation...


    I noticed your comment about the unlink() option, however, maybe the following approach is not that risky:

    • open the file (by name) with TRUNCATE (so you can assume its size is 0)
    • unlink it
    • mmap() it with your target size

    If your program gets killed in the middle, worst case is to leave an empty file.