Search code examples
elfportable-executablemach-o

What is the difference between executable formats?


Are there any major differences between PE, ELF and Mach-O? I mean, are does one have capabilities the others don't? Can one carry more information then the others? Or are they just a different container format for the same info?
I am not very knowledgeable on this, but it seems to me that they all carry text (code) sections, initialized and uninitialized data sections, etc. as well as relocation, symbol, string, import and export tables.

I am not asking about minor differences, such as that format X can split a data section or that format Y can be more efficiently parsed in hardware.
I am asking about major differences, such that they might affect the choice for a new general-purpose OS. Or that if a platform had a loader for all 3 formats, would it be trivial to convert from one format to the other by just "repackaging" the sections and rewriting the tables to the new format.


Solution

  • I am asking about major differences, such that they might affect the choice for a new general-purpose OS.

    In the long-run, there are little major differences between them – each tends to eventually gain whatever features it is missing compared to the others, and they all are converging towards feature-parity over time, at least for those features which are widely considered important nowadays.

    Some unique features:

    • while it isn't strictly part of the Mach-O format, Apple platforms support "fat binaries" (aka universal binaries), where a single file contains multiple Mach-O executables/libraries for different CPU architectures – with this you can put 32-bit and 64-bit images in the same file, you can put images for ARM, x86 and PPC in the same file, you can even put multiple images for same bitness and architecture but optimised for different CPU features (e.g. availability of AVX-512 instructions)
    • Since fat binaries aren't strictly speaking part of Mach-O, there is no reason in principle why you couldn't use the fat binary header in conjunction with some other executable format such as ELF or PE-COFF. Nothing existing will load it – but if we are talking about a "new general-purpose OS", such an OS might. (Except, if anyone actually did that, probably they should change the magic number – existing tools will be confused to see a fat binary containing something other than Mach-O.)
    • PE-COFF supports something like "universal binaries", but only where the other binary is an MS-DOS MZ executable. This is a rather useless feature nowadays, but had value in the earlier history of Windows (such as Windows 95 era), when many people still used DOS regularly, and would try to run Windows executables from DOS. While normally the MZ executable is just a "stub" which does nothing but print an error message and exit, you could actually embed a functional MS-DOS version of your app in it. While in principle you could stick an ELF or Mach-O on the end of your MZ executable (the idea is not exclusive to PE-COFF, it is also found with the older NE executable format used in Windows 1.x-3.x and OS/2 1.x, and also the LE executable format used by OS/2 2.x+), nothing existing is going to run it.
    • Mach-O format contains this concept of "shared caches", where Apple merges all the system libraries into a single giant file, and similarly (since macOS 11) the OS kernel and kernel extensions (kexts, what are called loadable modules on Linux or drivers on Windows) are merged into a single giant "kernel cache" file, which improves OS and application load times. This feature is only really for the OS vendor to use, although since Apple has open sourced the code for it, some competing OS which wanted to use Mach-O could adopt it. I'm not aware of any equivalent in ELF or PE-COFF
    • ELF binaries contain segments divided into sections, the sections have names but the segments don't. By contrast, in Mach-O, both segments and sections have names. Not really important (nobody really needs to name their segments), but a difference nonetheless. (PE-COFF binaries don't contain segments, only named sections.)
    • Personally, I think the structure of Mach-O headers (load commands) is more elegant than ELF or PE-COFF. That's just my personal opinion though, others may disagree. And those kind of aesthetic concerns are probably not that important in the big scheme of things
    • PE-COFF binaries have support for embedded resources (images, etc). There is nothing stopping you from embedding resources (even an embedded ZIP file) in an ELF or Mach-O binary, but the binary format itself has no explicit support for them. For Mach-O, Apple encourages having an "application bundle" directory structure, where the resources are not embedded in the executable but rather are individual files in adjacent directories. Some people really want their whole app to be just one file though.
    • PE-COFF and Mach-O are each de facto tied to a single vendor. ELF is more owned by the community as a whole (its official owner is the "Tools Interface Standard Committee", but I don't think that committee has actually existed for over 20 years). If you need to extend ELF, you'll probably find it easier getting your extensions accepted by LLVM/GCC/etc without needing to beg Apple or Microsoft for their blessing

    Overall, if you were starting a new OS from scratch, and weren't particularly concerned about compatibility with Apple or Microsoft, ELF is probably the answer – if you look at the OS research community, and the hobbyist OS development community, it is the most common choice. If things go wrong, ELF is the easiest format to get help with and find people with in-depth experience with. And if you really need any of the features missing from ELF, you could always define them as your own extensions.

    It is also worth noting that there are yet other executable formats still in use, even if more obscure than the "big 3". IBM AIX uses XCOFF, which like Microsoft's PE-COFF is an evolution of AT&T's original COFF format (which AT&T later replaced with ELF), but a divergent evolution. IBM z/OS uses GOFF, which is something entirely particular to IBM. If we start looking at systems no longer in common use, myriad other formats emerge – but there is a definite tendency to move away from "let's invent our own executable format". Even systems which used to use some custom format sometimes end up migrating to ELF – a good example of that is OpenVMS, which used to use its own proprietary executable format on VAX and Alpha, but with the move to Itanium adopted ELF instead (a decision continued by the new x86-64 port). Inventing a new executable format is most likely a waste of time, unless it has some compelling advantage over existing formats, or is aimed at a very different use case from them (see WebAssembly module format for a recent entirely justifiable example of "invent a new format")