Search code examples
cheap-memorymplayermemory-layout

Address of a Global Variable in the Heap Address Range


I was debugging the MPlayer-1.3.0 source code, and I saw a global variable whose address (returned by GDB or even simple printing) was in the range for the heap allocations, instead of the data section. I checked the heap range using procfs.

555555554000-555555834000 r-xp 00000000 08:12 798876  /usr/bin/mplayer
555555a33000-555555b25000 r--p 002df000 08:12 798876  /usr/bin/mplayer
555555b25000-555555b2b000 rw-p 003d1000 08:12 798876  /usr/bin/mplayer
555555b2b000-555556479000 rw-p 00000000 00:00 0       [heap]
7fffc3fff000-7fffc8000000 rw-s 00000000 00:16 1932    /dev/shm/pulse-shm-3887887751

The variable definition is int verbose = 0;, at line 40 of mp_msg.c and the address is 0x555555b3bbb0, which is in the [heap] mapping. I even checked some variable definitions before and after it:

int mp_msg_levels[MSGT_MAX]; // verbose level of this module. initialized to -2
int mp_msg_level_all = MSGL_STATUS;
int verbose = 0;
int mp_msg_color = 0;
int mp_msg_module = 0;

Of these, only mp_msg_level_all is located in the data section. Any help is appreciated.


Solution

  • Assuming your question is "why is int verbose = 0; allocated to [heap] memory mapping according to /proc/self/maps ?", the answer is that

    1. the whole [heap] notion is really a relic of the long forgotten past, and
    2. the traditional [heap] starts immediately after the .bss, and they usually share the same mapping, so there is nothing to be surprised about here.

    Expanding on point 1 a bit, in the traditional UNIX memory model of old (before threads and mmap became a thing), on processors where stack grows down, top half of the memory was reserved for the kernel space, stack started at the highest end of user memory, the program .text itself started at address 0, with .data and .bss immediately following, and then heap (the brk / sbrk kind) immediately after. This allowed heap to grow to higher addresses, and gave combined heap+stack the maximum available memory.

    That model doesn't work well at all in the presence of threads, shared libraries and memory mapped files, and has been largely abandoned by modern malloc implementations, which rarely bother with sbrk at all. Instead they just mmap the memory they need (and any such memory will not show up in [heap] that you see in procfs).

    P.S.

    • The idea of mapping zero page into the process space has long been abandoned as it only leads to bugs. This is why .text starts at higher addresses on all modern UNIXen.
    • Giving the kernel half of available address space is also quite wasteful, and 32-bit Linux started to give the kernel much less space. On 64-bit systems running out of address space is no longer an issue.

    Update:

    So you mean that [heap] contains both .bss and part of heap. So, the only way to determine if an address is inside the heap is to trace malloc(),free(),... calls?

    I don't think I explained this well.

    The notion that there is a single region in the process space called "heap" is obsolete. A modern malloc implementation is likely to have multiple thread-specific arenas, obtained from the system via mmap, and a heap-allocated object can be in any one of them.

    You can't easily say "oh, this address 0x568901234 looks like heap", because it could be anything.

    What is the standard way to determine the address ranges for virtual memory areas (e.g., .text, heap and .bss) of a process in Linux, if procfs output is obsolete?

    Here again, you are trying to explain memory layout in terms that are somewhat obsolete: there isn't a single .text or .bss in most processes, because each shared library will have its own (in addition to that of the main executable). And there are many additional sections as well (.tls, .plt, .got, etc.) And sections aren't even required at runtime at all -- ELF (at runtime) needs only segments, and doesn't care about sections.