Search code examples
cobject-filesnm

Why different type of symbol occupies the same length?


I compiled below code with cygwin GCC on a x64 machine:

gcc main.c -o main

(main.c)

long long mango = 13; // I also tried `char`, `short`, `int`
long melon = 2001;

void main()
{

}

Then I dump the symbol values with nm:

./main:0000000100402010 D mango
./main:0000000100402018 D melon

As I understand, a symbol's value just means its address. So the address for mango is 100402010. And melon has address 100402018. So mango should occupy 8 bytes.

I tried other types for mango, such as char, int, short. It is always 8 bytes occupied.

Why the size doesn't change?

ADD 1

Thanks to the comment.

I just tried below code:

typedef struct{
    char a1;
    char a2;
    char a3;
    char a4;
    char a5;
} MyStruct;

MyStruct MyObj1={1,2,3,4,5};

MyStruct MyObj2={1,2,3,4,5};

long long mango = 13;
long melon = 2001;

void main()
{

}

And this time, nm shows me this:

./main:0000000100402020 D mango
./main:0000000100402028 D melon
./main:0000000100402010 D MyObj1
./main:0000000100402015 D MyObj2

MyObj1 and MyObj2 are 5 bytes separated. So it is indeed up to the compiler to decide the padding.


Solution

  • From the GNU nm binary utilities: nm page:

    The symbol value, in the radix selected by options (see below), or hexadecimal by default. The symbol type. At least the following types are used; others are, as well, depending on the object file format. If lowercase, the symbol is usually local; if uppercase, the symbol is global (external). There are however a few lowercase symbols that are shown for special global symbols (u, v and w). Depending on pragma settings and default alignment boundaries, the distance between successive symbol address may be the exact value of the number of bytes for that symbol type, or it may include padding, which increases the apparent sizeof the symbol.

    A
    
        The symbol’s value is absolute, and will not be changed by further linking.
    B
    ...
    

    IMO the use of the word value in nm parlance is unfortunate, as in this context value is used to depict the symbol's address. The address of a symbol (value) will not change. But in normal C parlance, the value of a symbol does change, for example:

    int i = 0; // the address for symbol i will remain constant
    i = 10;    // but the value of the symbol i can change. 
    

    Regarding the size of addresses, the address of any symbol for a 64bit build will always have a size of 8 bytes, while the address of any symbol on a 32bit build will have a size of 4 bytes. These sizes do not change, and are not affected by assigning a value to the symbol assigned to them.

    Regarding the distance in memory space that occurs between various symbols, this distance is affected both by the type of symbol, how it is aligned along that implementations boundaries, and, as you have noted, compiler: "So it is indeed up to the compiler to decide the padding." Depending on pragma settings and default alignment boundaries, padding may cause the addresses for successive symbols to be a greater distance than that caused only by the combined sizeof values of the type, or types that define a particular symbol. (a very common occurrence for both char and struct type symbols).