java c++c reverse-engineering decompiling

Decompile C code with debug info?

Java and Python byte code are relatively easy to decompile than compiled machine code generated by C/C++ compiler.

I am unable to find a convincing answer as to why the information from the -g option is insufficient for de-compilation, but sufficient for debugging? What is the extra stuff contained in Python/Java byte code, that makes decompilation easy?

Solution

I am unable to find a convincing answer as to why the information from the -g option is insufficient for de-compilation, but sufficient for debugging?

The debugging information basically contains only mapping between the addresses in the generated code and the source files line numbers. The debugger does not need to decompile code - it just shows you the original sources. If the source files are missing, debugger won't magically show them.

That said, presence of debugging info does make decompilation easier. If the debug info includes the layout of the used types and function prototypes, the decompiler can use it and provide a much more precise decompilation. In many cases, however, it will still likely be different from the original source.

For example, here's a function decompiled with the Hex-Rays decompiler without using the debug info:

int __stdcall sub_4050A0(int a1)
{
  int result; // eax@1

  result = a1;
  if ( *(_BYTE *)(a1 + 12) )
  {
    result = sub_404600(*(_DWORD *)a1);
    *(_BYTE *)(a1 + 12) = 0;
  }
  return result;
}

Since it does not know the type of a1, the accesses to its fields are represented as additions and casts.

And here's the same function after the symbol file has been loaded:

void __thiscall mytree::write_page(mytree *this, PAGE *src)
{
  if ( src->isChanged )
  {
    cache::set_changed(this->cache, src->baseAddr);
    src->isChanged = 0;
  }
}

You can see that it's been improved quite a lot.

As for why decompiling bytecode is usually easier, in addition to NPE's answer check also this.