Search code examples
c++reverse-engineeringdecompiler

Why is there no accurate C++ decompiler?


Why is it not possible to create a C++ decompiler that will function as accurately as those made for Java and C#?


Solution

  • There are several reasons:

    1. Inlining. A lot of C++ code gets inlined in optimized builds. That plays havoc with any form of decompiler. To figure out that a function was inlined, the decompiler would have to analyze the specifics of the inlined code and match them up. And post-inlining optimization steps can make code very different, depending on where it was inlined.

    2. Templates. Templates use #1 exclusively, but they create additional problems. It is at least theoretically possible that a function that gets inlined in two places would compile to the same sequence of assembly instructions. But for template code, which was instantiated with different template arguments? Different instantiations will usually have to compile down to different sequences of instructions. And this becomes even more difficult, since template code can call different sets of functions based on the template parameters. And those functions themselves could be inlined.

    3. Compile-time execution. Template metaprogramming allows the compiler to actually execute code. But C++11's constexpr provides a more natural way to do some computations at compile time. Obviously, compile-time function calls or metafunction instantiations cannot be part of the compiled executable. Only the results of them will be (since that's kinda the point).

    4. Lack of comprehensive runtime reflection. C# and Java both lace their bytecode with a lot of information about what the nature of the original source code. Object definitions are easily detectable, as are object names, member variable types and names, etc. C++ compiles down to machine language, which is not required to have any such information. And since it isn't required, compilers don't generate it. Even the reflection study group of the ISO C++ committee is focused on compile-time reflection, which is information that won't be available at runtime.

      Even std::type_info doesn't offer anything. The reason being that, if the compiler does not detect that a particular type will have typeid called on it, then the compiler doesn't need to generate a std::type_info object for it. And even if it did, all that gives you is an object's name (and an identifier). Nothing more.