When I compile and run following C program on my Linux x86_64 machine, compiled by GCC :
#include <stdio.h>
int main(void)
{
char *p1 = "hello"; // Pointers to strings
char *p2 = "hello"; // Pointers to strings
if (p1 == p2) { // They are equal
printf("equal %p %p\n", p1, p2); // equal 0x40064c 0x40064c
// This is always the output on my machine
}
else {
printf("NotEqual %p %p\n", p1, p2);
}
}
I always get the output as:
equal 0x40064c 0x40064c
I understand that strings are stored in a constant table but address are too low when compared to dynamically allocated memory.
Compare with following program:
#include <stdio.h>
int main(void)
{
char p1[] = "hello"; // char arrar
char p2[] = "hello"; // char array
if (p1 == p2) {
printf("equal %p %p\n", p1, p2);
}
else { // Never equal
printf("NotEqual %p %p\n", p1, p2); // NotEqual 0x7fff4b25f720 0x7fff4b25f710
// Different pointers every time
// Pointer values too large
}
}
The two pointers are not equal, because these are two arrays which can be independently manipulated.
I want to know how GCC generates the code for these two programs and how are they mapped to memory during execution. Since this would be already documented do so many times any links to documentation are welcome as well.
In both cases the compiler emits the actual bytes of the string "hello"
just once, in the .rodata
section of the program (rodata stands for read only data).
They are actually mapped directly from the executable file into memory, somewhat similar to the code section. That's why they are far apart from the dynamically allocated ones.
Then:
char *p = "hello";
Simply initializes p
to the address of this (read-only) data.
And obviously:
char *q = "hello";
Gets the very same address. This is called string pooling and is an optional popular optimization of the compiler.
But when you write:
char p[] = "hello";
It will probably generate something like this:
char p[6];
memcpy(p, "hello", 6);
Being the "hello"
actually the address of the read-only pooled string.
The call to memcpy
is for illustration purposes only. It may very well to the copy inline, instead than with a function call.
If later you do:
char q[] = "hello";
It will define another array and another memcpy()
. So same data, but different addresses.
But where these array variables will reside? Well, that depends.
.data
section of the executable, and they will be saved there with the correct characters already in there, so no memcpy
is needed in run time. Which is nice, because that memcpy
would have to be executed before main
.variables of static duration
or something like that.About the documentation links, sorry, I don't know of any.
But who needs documentation if you can do the experiments yourself? For that the best tool around is objdump
, it can disassemble the program, dump the data sections and much more!
I hope this answer your questions...