Search code examples
c++objdumpdwarf

Recompile C++ binary from debug info


This is more out of curiosity than productive need but I have been asking myself if it is possible to extract the C++ source of a binary such that it can be recompiled to produce a working clone of the binary.

If have tried to:

  1. compile the binary with "-g -Og" to include dwarf info,
  2. used objdump with "-S" and "--source-comment" to interleave the sources into the dump
  3. grepped out all the commented source lines
  4. removed the comment and
  5. formatted with clang-format

The output is pretty decent C++ but there is quite some confusion with the order of the source lines and with sources lines that have no real effect (such as a function's closing "}"). Example:

 bool UsartHal1::isTransmitRegisterEmpty()
 {
     return USART1->SR & USART_SR_TXE;

     bool Usart1::write(uint8_t data)
     {
         if (UsartHal1::isTransmitRegisterEmpty())
         {
             USART1->DR = data;
             UsartHal1::write(data);
             return true;
         }
         else
         {
             return false;
         }
     }
     return USART1->SR & USART_SR_RXNE;
 }

 bool Usart1::read(uint8_t & data)
 {
     if (UsartHal1::isReceiveRegisterNotEmpty())
     {
         data = USART1->DR;
         UsartHal1::read(data);
         return true;
     }
     else
     {
         return false;
     }
 }
 return USART1->SR & USART_SR_RXNE;

I can of course imagine that what I am trying to do is simply not possible - not all source lines have an effect that will make it to the binary and there is no real reason for the compiler to guarantee that the code placement will adhere to the order of lines in the sources.

Still I am wondering if there are perhaps some options/esoteric compiler-flags that will make this possible? After all, coverage analysis tools face the same problems.


Solution

  • Debug symbols contain a lot of information that allows you to map stuff from the binary back to the source code (assuming you have access to both) especially in unoptimized builds. But extracting/recreating the original source exactly from the compiled binary is simply not possible.