Search code examples
c++nmsymbol-table

What does (.eh) mean in nm output?


When I look at the symbols in my library, nm mylib.a, I see some duplicate entries that look like this:

000000000002d130 S __ZN7quadmat11SpAddLeavesC1EPNS_14BlockContainerEPy
00000000000628a8 S __ZN7quadmat11SpAddLeavesC1EPNS_14BlockContainerEPy.eh

When piped through c++filt:

000000000002d130 S quadmat::SpAddLeaves::SpAddLeaves(quadmat::BlockContainer*, unsigned long long*)
00000000000628a8 S quadmat::SpAddLeaves::SpAddLeaves(quadmat::BlockContainer*, unsigned long long*) (.eh)

What does that .eh mean, and what is this extra symbol used for?

I see it has something to do with exception handling. But why does that use an extra symbol?

(I'm noticing this with clang)


Solution

  • Here's some simple code:

    bool extenrnal_variable;
    
    int f(...)
    {
        if (extenrnal_variable)
            throw 0;
    
        return 42;
    }
    
    int g()
    {
        return f(1, 2, 3);
    }
    

    I added extenrnal_variable to prevent the compiler from optimizing all the branches away. f has ... to prevent inlining.

    When compiled with:

    $ clang++ -S -O3 -m32 -o - eh.cpp | c++filt
    

    it emits the following code for g() (the rest is omitted):

    g():                                 ## @_Z1gv
        .cfi_startproc
    ## BB#0:
        pushl   %ebp
    Ltmp9:
        .cfi_def_cfa_offset 8
    Ltmp10:
        .cfi_offset %ebp, -8
        movl    %esp, %ebp
    Ltmp11:
        .cfi_def_cfa_register %ebp
        subl    $24, %esp
        movl    $3, 8(%esp)
        movl    $2, 4(%esp)
        movl    $1, (%esp)
        calll   f(...)
        movl    $42, %eax
        addl    $24, %esp
        popl    %ebp
        ret
        .cfi_endproc
    

    All these .cfi_* directives are there for the stack unwinding in case of an exception being thrown. They all compiled into into an FDE (Frame Description Entry) block and saved under the g().eh (__Z1gv.eh mangled) name. These directives specify where on the stack the CPU registers are saved. When an exception is thrown and the stack is being unwound the code in the function should not be executed (except for the destructors of locals), but the registers that were saved earlier should be restored. These tables store exactly that information.

    These tables could be dumped via the dwarfdump tool:

    $ dwarfdump --eh-frame --english eh.o | c++filt
    

    The output:

    0x00000018: FDE
            length: 0x00000018
       CIE_pointer: 0x00000000
        start_addr: 0x00000000 f(...)
        range_size: 0x0000004d (end_addr = 0x0000004d)
      Instructions: 0x00000000: CFA=esp+4     eip=[esp]
                    0x00000001: CFA=esp+8     ebp=[esp]  eip=[esp+4]
                    0x00000003: CFA=ebp+8     ebp=[ebp]  eip=[ebp+4]
                    0x00000007: CFA=ebp+8     ebp=[ebp]  esi=[ebp-4]  eip=[ebp+4]
    
    0x00000034: FDE
            length: 0x00000018
       CIE_pointer: 0x00000000
        start_addr: 0x00000050 g()
        range_size: 0x0000002c (end_addr = 0x0000007c)
      Instructions: 0x00000050: CFA=esp+4     eip=[esp]
                    0x00000051: CFA=esp+8     ebp=[esp]  eip=[esp+4]
                    0x00000053: CFA=ebp+8     ebp=[ebp]  eip=[ebp+4]
    

    Here you could find out about the format of this block. Here a bit more and some alternative more compact way of representing the same information. Basically this block describes which registers and where from on the stack to pop during the stack unwinding.

    To see the raw content of these symbols you can list all the symbols with their offsets:

    $ nm -n eh.o
    
    00000000 T __Z1fz
             U __ZTIi
             U ___cxa_allocate_exception
             U ___cxa_throw
    00000050 T __Z1gv
    000000a8 s EH_frame0
    000000c0 S __Z1fz.eh
    000000dc S __Z1gv.eh
    000000f8 S _extenrnal_variable
    

    And then dump the (__TEXT,__eh_frame) section:

    $ otool -s __TEXT __eh_frame eh.o
    
    eh.o:
    Contents of (__TEXT,__eh_frame) section
    000000a8    14 00 00 00 00 00 00 00 01 7a 52 00 01 7c 08 01
    000000b8    10 0c 05 04 88 01 00 00 18 00 00 00 1c 00 00 00
    000000c8    38 ff ff ff 4d 00 00 00 00 41 0e 08 84 02 42 0d
    000000d8    04 44 86 03 18 00 00 00 38 00 00 00 6c ff ff ff
    000000e8    2c 00 00 00 00 41 0e 08 84 02 42 0d 04 00 00 00
    

    By matching the offsets you could see how each symbol is encoded.

    When there are local variables present, they would have to be destroyed during the stack unwinding. For that there's usually more code embedded in the functions themselves and some additional bigger tables are created. You could explore that yourself by adding a local variable with non-trivial destructor into g, compiling and looking at the assembly output.

    Further reading