Search code examples
programming-languagesbinarydisassembly

Determine source language from a binary?


I responded to another question about developing for the iPhone in non-Objective-C languages, and I made the assertion that using, say, C# to write for the iPhone would strike an Apple reviewer wrong. I was speaking largely about UI elements differing between the ObjC and C# libraries in question, but a commenter made an interesting point, leading me to this question:

Is it possible to determine the language a program is written in, solely from its binary? If there are such methods, what are they?

Let's assume for the purposes of the question:

  • That from an interaction standpoint (console behavior, any GUI appearance, etc.) the two are identical.
  • That performance isn't a reliable indicator of language (no comparing, say, Java to C).
  • That you don't have an interpreter or something between you and the language - just raw executable binary.

Bonus points if you're language-agnostic as possible.


Solution

  • I'm not a compiler hacker (someday, I hope), but I figure that you may be able to find telltale signs in a binary file that would indicate what compiler generated it and some of the compiler options used, such as the level of optimization specified.

    Strictly speaking, however, what you're asking is impossible. It could be that somebody sat down with a pen and paper and worked out the binary codes corresponding to the program that they wanted to write, and then typed that stuff out in a hex editor. Basically, they'd be programming in assembly without the assembler tool. Similarly, you may never be able to tell with certainty whether a native binary was written in straight assembler or in C with inline assembly.

    As for virtual machine environments such as JVM and .NET, you should be able to identify the VM by the byte codes in the binary executable, I would expect. However you may not be able to tell what the source language was, such as C# versus Visual Basic, unless there are particular compiler quirks that tip you off.