Search code examples
compiler-constructionprogramming-languagesobject-code

Trace of the language on the object code


Is it possible to look at an object code and tell which language has been used originally to produce it? or does the language leaves a trace or a stamp on the object code ? do the compilers of various languages use a fixed format for a given ISA to develop the object code?


Solution

  • There is no general algorithm, but in practice it is often possible. Usually you can just look at the libraries that the application depends on - if a Windows application depends on msvcrt.dll, for example, then there's a high chance that it's a C or C++ program compiled with Visual C++. Sometimes a compiler leaves traces of evidence in the .data section. Here is what I see when opening a "Hello, World!"-like Haskell binary (compiled with GHC) in a hex editor:

    GHC

    Here's what GCC's "copyright notice" looks like:

    GCC

    A trained eye can even recognize compiler version by looking at disassembly (every compiler optimizes code slightly differently and has its own implementation quirks). If you need to automate this, I suggest looking at machine learning techniques.