Search code examples
clinuxmemorymemory-addressstring-literals

Why are the memory addresses of string literals so different from others', on Linux?


I noticed that string literals have very different addresses in memory than other constants and variables (Linux OS): they have many leading zeroes (not printed).

Example:

const char *h = "Hi";
int i = 1;
printf ("%p\n", (void *) h);
printf ("%p\n", (void *) &i);

Output:

0x400634
0x7fffc1ef1a4c

I know they are stored in the .rodata part of the executable. Is there a special way the OS handles it afterwards, so the literals end up in a special area of memory (with leading zeroes)? Are there any advantages of that memory location or is there something special about it?


Solution

  • Here's how process memory is laid out on Linux (from http://www.thegeekstuff.com/2012/03/linux-processes-memory-layout/):

    Linux process memory layout

    The .rodata section is a write-protected subsection of the Initialized Global Data block. (A section which ELF executables designate .data is its writable counterpart for writable globals initialized to nonzero values. Writable globals initialized to zeros go to the .bss block. By globals here I mean global variables and all static variables regardless of placement.)

    The picture should explain the numerical values of your addresses.

    If you want to investigate further, then on Linux you can inspect the /proc/$pid/maps virtual files which describe the memory layout of running processes. You won't get the reserved (starting with a dot) ELF section names, but you can guess which ELF section a memory block originated from by looking at its memory protection flags. For example, running

    $ cat /proc/self/maps #cat's memory map
    

    gives me

    00400000-0040b000 r-xp 00000000 fc:00 395465                             /bin/cat
    0060a000-0060b000 r--p 0000a000 fc:00 395465                             /bin/cat
    0060b000-0060d000 rw-p 0000b000 fc:00 395465                             /bin/cat
    006e3000-00704000 rw-p 00000000 00:00 0                                  [heap]
    3000000000-3000023000 r-xp 00000000 fc:00 3026487                        /lib/x86_64-linux-gnu/ld-2.19.so
    3000222000-3000223000 r--p 00022000 fc:00 3026487                        /lib/x86_64-linux-gnu/ld-2.19.so
    3000223000-3000224000 rw-p 00023000 fc:00 3026487                        /lib/x86_64-linux-gnu/ld-2.19.so
    3000224000-3000225000 rw-p 00000000 00:00 0
    3000400000-30005ba000 r-xp 00000000 fc:00 3026488                        /lib/x86_64-linux-gnu/libc-2.19.so
    30005ba000-30007ba000 ---p 001ba000 fc:00 3026488                        /lib/x86_64-linux-gnu/libc-2.19.so
    30007ba000-30007be000 r--p 001ba000 fc:00 3026488                        /lib/x86_64-linux-gnu/libc-2.19.so
    30007be000-30007c0000 rw-p 001be000 fc:00 3026488                        /lib/x86_64-linux-gnu/libc-2.19.so
    30007c0000-30007c5000 rw-p 00000000 00:00 0
    7f49eda93000-7f49edd79000 r--p 00000000 fc:00 2104890                    /usr/lib/locale/locale-archive
    7f49edd79000-7f49edd7c000 rw-p 00000000 00:00 0
    7f49edda7000-7f49edda9000 rw-p 00000000 00:00 0
    7ffdae393000-7ffdae3b5000 rw-p 00000000 00:00 0                          [stack]
    7ffdae3e6000-7ffdae3e8000 r--p 00000000 00:00 0                          [vvar]
    7ffdae3e8000-7ffdae3ea000 r-xp 00000000 00:00 0                          [vdso]
    ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0                  [vsyscall]
    

    The first r-xp block definitely came from .text (executable code), the first r--p block from .rodata, and the following rw-- blocks from .bss and .data. (In between the heap and the stack block are blocks loaded from dynamically linked libraries by the dynamic linker.)


    Note: To comply with the standard, you should cast the int* for "%p" to (void*) or else the behavior is undefined.