Search code examples
dockerkubernetespyarrowapache-arrow

Do memory mapped files in Docker containers in Kubernetes work the same as in regular processes in Linux?


I have process A and process B. Process A opens a file, calls mmap and write to it, process B do the same but reads the same mapped region when process A has finished writing.

Using mmap, process B is suppossed to read the file from memory instead of disk assuming process A has not called munmap.

If I would like to deploy process A and process B to diferent containers in the same pod in Kubernetes, is memory mapped IO supposed to work the same way as the initial example? Should container B (process B) read the file from memory as in my regular Linux desktop?

Let's assume both containers are in the same pod and are reading/writing the file from the same persistent volume. Do I need to consider a specific type of volume to achieve mmap IO?

In case you are courious I am using Apache Arrow and pyarrow to read and write those files and achieve zero-copy reads.


Solution

  • A Kubernetes pod is a group of containers that are deployed together on the same host. (reference). So this question is really about what happens for multiple containers running on the same host.

    Containers are isolated on a host using a number of different technologies. There are two that might be relevant here. Neither prevent two processes from different containers sharing the same memory when they mmap a file.

    The two things to consider are how the file systems are isolated and how memory is ring fenced (limited).

    How the file systems are isolated

    The trick used is to create a mount namespace so that any new mount points are not seen by other processes. Then file systems are mounted into a directory tree and finally the process calls chroot to set / as the root of that directory tree.

    No part of this affects the way processes mmap files. This is just a clever trick on how file names / file paths work for the two different processes.

    Even if, as part of that setup, the same file system was mounted from scratch by the two different processes the result would be the same as a bind mount. That means the same file system exists under two paths but it is *the same file system, not a copy.

    Any attempt to mmap files in this situation would be identical to two processes in the same namespace.

    How are memory limits applied?

    This is done through cgroups. cgroups don't really isolate anything, they just put limits on what a single process can do.

    But there is a natuarl question to ask, if two processes have different memory limits through cgroups can they share the same shared memory? Yes they can!

    Note: file and shmem may be shared among other cgroups. In that case, mapped_file is accounted only when the memory cgroup is owner of page cache.

    The reference is a little obscure, but describes how memory limits are applied to such situations.

    Conclusion

    Two processes both memory mapping the same file from the same file system as different containers on the same host will behave almost exactly the same as if the two processes were in the same container.