Search code examples
cdebuggingstatic-analysisdebug-symbolsdwarf

Map Var to Declaration Using Dwarf DebugInfo and Source Code


Given the line number of a variable access (not declaration), how can I determine its type (or its declaration DIE in the .info tree)?

Look at the following code:

void foo()
{
   {
      struct A *b;
   }

   {
      struct B *b;

      b = malloc(sizeof(struct B));
   }
}

Suppose that I have this source code and it is compiled with debug information in DWARF format. How can I determine that variable b is of type struct B * using the source code and debug information?

I mean how can I automatize it offline? The problem is that in the .info section of DWARF there is no mapping between source code (e.g., line number) and scope information. In the example above, using debug information, we can determine that there is a variable of type struct A * which is a child of foo() and a variable of type struct B * which is the other child of foo(). Parsing the source code can help to determine the nesting level at which the access has occurred, but there is no way to map the accessed variable to its type. Because there are two types at the same level at which b is accessed.

If there is a way to force the compiler to include more information in the debug information, the problem can be solved. For example, adding DW_AT_high_pc and DW_AT_low_pc to the debug information of DIEs of type DW_TAG_lexical_block will help.


Solution

  • Here is the output of objdump --dwarf=info mplayer for an MPlayer-1.3.0 compiled using -gdwarf-2 option.

    <2><4000e>: Abbrev Number: 43 (DW_TAG_lexical_block)
    <3><4000f>: Abbrev Number: 37 (DW_TAG_variable)
    <40010>   DW_AT_name        : px
    <40013>   DW_AT_decl_file   : 1
    <40014>   DW_AT_decl_line   : 2079
    <40016>   DW_AT_type        : <0x38aed>
    <3><4001a>: Abbrev Number: 37 (DW_TAG_variable)
    <4001b>   DW_AT_name        : py
    <4001e>   DW_AT_decl_file   : 1
    <4001f>   DW_AT_decl_line   : 2080
    <40021>   DW_AT_type        : <0x38aed>
    <3><40025>: Abbrev Number: 0
    <2><40026>: Abbrev Number: 0
    

    As you can see at offset 0x4000e, there is a lexical block with no attribute. The corresponding source code is located in libvo/gl_common.c:2078:

    for (i = 0; i < 4; i++) {
    int px = 2*i;
    int py = 2*i + 1;
    mpglTexCoord2f(texcoords[px], texcoords[py]);
    if (is_yv12) {
      mpglMultiTexCoord2f(GL_TEXTURE1, texcoords2[px], texcoords2[py]);
      mpglMultiTexCoord2f(GL_TEXTURE2, texcoords2[px], texcoords2[py]);
    }
    if (use_stipple)
      mpglMultiTexCoord2f(GL_TEXTURE3, texcoords3[px], texcoords3[py]);
    mpglVertex2f(vertices[px], vertices[py]);
    }
    

    The block is a for block. There are many more similar lexical_block instances.

    My solution consists of two parts:

    1) Source code analysis:

    Find the scope (surrounding left and right braces) where the target variable is accessed. In fact we only need to store the line number of the left brace.

    Find the level of the scope in the tree of scopes (a tree that shows parent/child relationships similar to what can be found in .info.

    At this point we have the start line of the scope corresponding to a variable access and the level of the scope in the tree of scopes (e.g., line 12 and level 2 in the code depicted in the original question).

    2) DebugInfo analysis:

    Now, we can analyze the appropriate CU and look for the declarations of that target variable. The important point is that only the declarations with a line number smaller than the line number of the access point are valid. Considering this, we can search the global scope, and continue with deeper levels, in order.

    Declarations with scopes deeper than the scope of the access are invalid. Declarations with the same scope as the target variable are only valid if their line number is between the start line of the target scope and the line number of the variable access.