Search code examples
linuxlinux-kernelsystem-callsmmap

How Linux manage memory when doing IO


In my point of view, when we read a file by calling open and then read system call, the content of the file in disk will firstly be read in kernel space and then copy to user space or the process, so, if the file is big file like 1G, read will occupy 2G memory in physical memory, 1G maps to kernel space, and 1G maps to the process. I know this may be wrong, but Where am I wrong? How Linux manage memory in situation like reading a file? If I use mmap instead of read, then how the Linux deal with it differently?


Solution

  • If the file is big file like 1G, read will occupy 2G memory in physical memory, 1G maps to kernel space, and 1G maps to the process.

    No, this assumption is wrong.

    Dependent on filesystem, kernel may read file on, e.g., 4K portions (one page).

    When user requests 1G bytes via read system call, kernel may copy to user buffer only 4K portion of the file, and return number of bytes have been read. After that user may repeat read syscall with adjusted size and buffer address.

    If I use mmap instead of read, then how the Linux deal with it differently?

    In case of mmap reading every 4K block from the file may be deffered until this block will be actually accessed by the user.

    This follows general interpretation of user memory: user operates with virtual memory, which should be mapped to the physical one only when accessed.