Search code examples
linuxmemory-managementprocessshared-librariesloader

How are multiple copies of shared library text section avoided in physical memory?


When Linux loads shared libraries, my understanding is that, the text section is loaded only once into physical memory and is then mapped across page tables of different processes that reference it.

But where/who ensures/checks that the same shared library text section has not been loaded into a physical memory multiple times?

Is the duplication avoided by the loader or by the mmap() system call or is there some other way and how?

Edit1: I must've shown what was done so far (research). Here it is...

Tried to trace a simple sleep command.

$ strace sleep 100 &
[1] 22824
$ execve("/bin/sleep", ["sleep", "100"], [/* 26 vars */]) = 0
brk(0)                                  = 0x89bd000
access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)
open("/etc/ld.so.cache", O_RDONLY)      = 3
fstat64(3, {st_mode=S_IFREG|0644, st_size=92360, ...}) = 0
mmap2(NULL, 92360, PROT_READ, MAP_PRIVATE, 3, 0) = 0xb7f56000
close(3)                                = 0
open("/lib/libc.so.6", O_RDONLY)        = 3
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\0`G\0004\0\0\0"..., 512) = 512
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7f55000
fstat64(3, {st_mode=S_IFREG|0755, st_size=1706232, ...}) = 0
mmap2(0x460000, 1426884, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x460000
mmap2(0x5b7000, 12288, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x156) = 0x5b7000
mmap2(0x5ba000, 9668, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x5ba000
close(3)                                = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7f54000
...
munmap(0xb7f56000, 92360)               = 0
...

Then checked the /proc/pid/maps file for this process;

$ cat /proc/22824/maps
00441000-0045c000 r-xp 00000000 fd:00 2622360    /lib/ld-2.5.so
...
00460000-005b7000 r-xp 00000000 fd:00 2622361    /lib/libc-2.5.so
...
00e3e000-00e3f000 r-xp 00e3e000 00:00 0          [vdso]
08048000-0807c000 r-xp 00000000 fd:00 5681559    /usr/bin/strace
...

Here it was seen that the addr argument for mmap2() of libc.so.6 with PROT_READ|PROT_EXEC was at a specific address. This lead me to believe that the shared library mapping in physical memory was somehow managed by loader.


Solution

  • Shared libraries are loaded in by the mmap() syscall, and the Linux kernel is smart. It has an internal data structure, which maps the file descriptors (containing the mount instance and the inode number) to the mapped pages in it.

    The dynamic linker (its code is somewhere /lib/ld-linux.so or similar) only uses this mmap() call to map the libraries (and then relocates their symbol tables), this page-level deduplication is done entirely by the kernel.

    The mappings happen with PROT_READ|PROT_EXEC|PROT_SHARED flags, what you can easily check by stracing any tool (like strace /bin/echo).