Search code examples
cassemblystaticglobal-variablessections

Global variables and the .data section


Is a variable that is stored in the .data section by definition a global variable that has program scope? In other words are these two words synonymous and one implies the other, or, for example would it be possible to have a global variable that is not stored in the .data section, or a label/variable that is not global?

Just to give a basic example:

// this is compiled as in the .data section with a .globl directive
char global_int = 11;

int main(int argc, char * argv[])
{

}

Would compile to something like:

global_int:
        .byte   11
main:
    ...

But I'm seeing if the two terms -- global and "in the .data section" are the same thing or if there are counterexamples.


Solution

  • There are two different concepts: Which "section" a variable goes into and its "visibility"


    For comparison, I've add a .bss section variable:

    char global_int = 11;
    char nondata_int;
    
    int
    main(int argc, char *argv[])
    {
    }
    

    Compiling with cc -S produces:

        .file   "fix1.c"
        .text
        .globl  global_int
        .data
        .type   global_int, @object
        .size   global_int, 1
    global_int:
        .byte   11
        .comm   nondata_int,1,1
        .text
        .globl  main
        .type   main, @function
    main:
    .LFB0:
        .cfi_startproc
        pushq   %rbp
        .cfi_def_cfa_offset 16
        .cfi_offset 6, -16
        movq    %rsp, %rbp
        .cfi_def_cfa_register 6
        movl    %edi, -4(%rbp)
        movq    %rsi, -16(%rbp)
        movl    $0, %eax
        popq    %rbp
        .cfi_def_cfa 7, 8
        ret
        .cfi_endproc
    .LFE0:
        .size   main, .-main
        .ident  "GCC: (GNU) 8.3.1 20190223 (Red Hat 8.3.1-2)"
        .section    .note.GNU-stack,"",@progbits
    

    Note the .data to put the global_int variable in the data section. And, .comm to put nondata_int into the .bss section

    Also, note the .globl to make the variables have global visibility (i.e. can be seen by other .o files).

    Loosely, .data and/or .bss are the sections that the variables are put into. And, global [.globl] are the visibility. If you did:

    static int foobar = 63;
    

    Then, foobar would go into the .data section but be local. In the nm output below, instead of D, it would be d to indicate local/static visibility. Other .o files would not be able to see this [or link to it].


    An nm of the .o program produces:

    0000000000000000 D global_int
    0000000000000000 T main
    0000000000000001 C nondata_int
    

    And, an nm -g of the final executable produces:

    000000000040401d B __bss_start
    0000000000404018 D __data_start
    0000000000404018 W data_start
    0000000000401050 T _dl_relocate_static_pie
    0000000000402008 R __dso_handle
    000000000040401d D _edata
    0000000000404020 B _end
    0000000000401198 T _fini
    000000000040401c D global_int
                     w __gmon_start__
    0000000000401000 T _init
    0000000000402000 R _IO_stdin_used
    0000000000401190 T __libc_csu_fini
    0000000000401120 T __libc_csu_init
                     U __libc_start_main@@GLIBC_2.2.5
    0000000000401106 T main
    000000000040401e B nondata_int
    0000000000401020 T _start
    0000000000404020 D __TMC_END__
    

    UPDATE:

    thanks for this answer. Regarding And, .comm to put nondata_int into the .bss section. Could you please explain that a bit? I don't see any reference to .bss so how are those two related?

    Sure. There's probably a more rigorous explanation, but loosely, when you do:

    int nondata_int;
    

    You are defining a "common" section variable [the historical origin is from Fortran's common].

    When linking [to create the final executable], if no other .o [or .a] has declared a value for it, it will be put into the .bss section as a B symbol.

    But, if another .o has defined it (e.g. define_it.c):

    int nondata_int = 43;
    

    There, define_it.o will put it in the .data section as a D symbol

    Then, when you link the two:

    gcc -o executable fix1.o define_it.o
    

    Then, in executable, it will go to the .data section as a D symbol.

    So, .o files have/use .comm [the assembler directive] and C common section.

    Executables have only .data, and .bss. So, given the .o files a common symbol goes to [is promoted to] .bss if it has never been initialized and .data if any .o has initialized it.

    Loosely, .comm/C is a suggestion and .data and .bss is a "commitment"

    This is a nicety of sorts. Technically, in fix1.c, if we knew beforehand that we were going to be linked with define_it.o, we would [probably] want to do:

    extern char nondata_int;
    

    Then, in fix1.o, the would be marked as an "undefined" symbol (i.e. nm would show U).

    But, then, if fix1.o were not linked to anything that defined the symbol, the linker would complain about an undefined symbol.

    The common symbol allows us to have multiple .o files that each do:

    int nondata_int;
    

    They all produce C symbols. The linker combines all to produce a single symbol.

    So, again common C symbols are:

    I want a global named X and I want it to be the same X as found in any other .o files, but don't complain about the symbol being multiply defined. If one [and only one] of those .o files gives it an initialized value, I'd like to benefit from that value.

    Historically ...

    IIRC [and I could be wrong about this], common was added [to the linker] to support Fortran COMMON declarations/variables.

    That is, all fortran .o files just declared a symbol as common [its concept of global], but the fortran linker was expected to combine them.

    Classic/old fortran could only specify a variable as COMMON (i.e. in C, equivalent to int val;) but fortran did not have global initializers (i.e. it did not have extern int val; or int val = 1;)

    This common was useful for C, so, at some point it was added.

    In the good old days (tm), the common linker type did not exist and one had to have an explicit extern in all but one .o file and one [and only one] that declared it. That .o that declared it could define it with a value (e.g.) int val = 1; or without (e.g.) int val; but all other .o files had to use extern int val;