Search code examples
macosgccobjdumpdwarfdsymutil

where/how does Apples GCC store DWARF inside an executable


Where/how does Apples GCC store DWARF inside an executable?

I compiled a binary via gcc -gdwarf-2 (Apples GCC). However, neither objdump -g nor objdump -h does show me any debug information.

Also libbfd does not find any debug information. (I asked on the binutils-mailinglist about it here.)

I am able however to extract the debugging information via dsymutil (into a dSYM). libbfd is also able to read those debug info then.


Solution

  • On Mac OS X there was a decision to have the linker ld not process all of the debug information when you link your program. The debug information is often 10xthe size of the program executable so having the linker process all of the debug info and include it in the executable binary was a serious detriment to link times. For iterative development - compile, link, compile, link, debug, compile link - this was a real hit.

    Instead:

    • The compiler generates the DWARF debug information in the .s files, the assembler outputs it in the .o files
    • The linker includes a "debug map" in the executable binary which tells debug info users where all of the symbols were relocated during the link.

    A consumer (doing .o-file debugging) loads the debug map from the executable and processes all of the DWARF in the .o files as-needed, remapping the symbols as per the debug map's instructions.

    dsymutil can be thought of as a debug info linker. It does this same process -- read the debug map, load the DWARF from the .o files, relocate all the addresses -- and then outputs a single binary of all the DWARF at their final, linked addresses. This is the dSYM bundle.

    Once you have a dSYM bundle, you've got plain old standard DWARF that any dwarf reading tool (which can deal with Mach-O binary files) can process.

    There is an additional refinement that makes all of this work, the UUIDs included in Mach-O binaries. Every time the linker creates a binary, it emits a 128-bit UUID in the LC_UUID load command (v. otool -hlv or dwarfdump --uuid). This uniquely identifies that binary file. When dsymutil creates the dSYM, it includes that UUID. The debuggers will only associate a dSYM and an executable if they have matching UUIDs -- no dodgy file mod timestamps or anything like that.

    We can also use the UUIDs to locate the dSYMs for binaries. They show up in crash reports, we include a Spotlight importer that you can use to search for them, e.g. mdfind "com_apple_xcode_dsym_uuids == E21A4165-29D5-35DC-D08D-368476F85EE1" if the dSYM is located in a Spotlight indexed location. You can even have a repository of dSYMs for your company and a program that can retrieve the correct dSYM given a UUID - maybe a little mysql database or something like that - so you run the debugger on a random executable and you instantly have all the debug info for that executable. There are some pretty neat things you can do with the UUIDs.

    But anyway, to answer your original question: The unstripped binary has the debug map, the .o files have the DWARF, and when dsymutil is ran these are combined to create the dSYM bundle.

    If you want to see the debug map entries, do nm -pa executable and they're all there. They are in the form of the old stabs nlist records - the linker already knew how to process stabs so it was easiest to use them - but you'll see how it works without much trouble, maybe refer to some stabs documentation if you're uncertain.