Search code examples
cplatformabi

Type specifications in platform ABIs


Which of these items can safely be assumed to be defined in any practically-usable platform ABI?

  1. Value of CHAR_BIT

  2. Size, alignment requirements and object representation of:

    1. void*, size_t, ptrdiff_t
    2. unsigned char and signed char
    3. intptr_t and uintptr_t
    4. float, double and long double
    5. short and long long
    6. int and long (but here I expect a "no")
    7. Pointer to an object type for which the platform ABI specifies these properties
    8. Pointer to function whose type only involves types for which the platform ABI specifies these properties
  3. Object representation of a null object pointer

  4. Object representation of a null function pointer

For example, if I have a library (compiled by an unknown, but ABI-conforming compiler) which publishes this function:

void* foo(void *bar, size_t baz, void* (*qux)());

can I assume to be able to safely call it in my program regardless of the compiler I use?

Or, taken the other way round, if I am writing a library, is there a set of types such that if I limit the library's public interface to this set, it will be guaranteed to be usable on all platforms where it builds?


Solution

  • The C standard contains an entire section in the appendix summarizing just that:

    J.3 Implementation-defined behavior

    A completely random subset:

    • The number of bits in a byte

    • Which of signed char and unsigned char is the same as char

    • The text encodings for multibyte and wide strings

    • Signed integer representation

    • The result of converting a pointer to an integer and vice versa (6.3.2.3). Note that this means any pointer, not just object pointers.


    Update: To address your question about ABIs: An ABI (application binary interface) is not a standardized concept, and it isn't said anywhere that an implementation must even specify an ABI. The ingredients of an ABI are partly the implementation-defined behaviour of the language (though not all of it; e.g. signed-to-unsigned conversion is implementation defined, but not part of an ABI), and most of the implementation-defined aspects of the language are dictated by the hardware (e.g. signed integer representation, floating point representation, size of pointers).

    However, more important aspects of an ABI are things like how function calls work, i.e. where the arguments are stored, who's responsible for cleaning up the memory, etc. It is crucial for two compilers to agree on those conventions in order for their code to be binarily compatible.

    In practice, an ABI is usually the result of an implementation. Once the compiler is complete, it determines -- by virtue of its implementation -- an ABI. It may document this ABI, and other compilers, and future versions of the same compiler, may like to stick to those conventions. For C implementations on x86, this has worked rather well and there are only a few, usually well documented, free parameters that need to be communicated for code to be interoperable. But for other languages, most notably C++, you have a completely different picture: There is nothing coming near a standard ABI for C++ at all. Microsoft's compiler breaks the C++ ABI with every release. GCC tries hard to maintain ABI compatibility across versions and uses the published Itanium ABI (ironically for a now dead architecture). Other compilers may do their own, completely different thing. (And then you have of course issues with C++ standard library implementations, e.g. does your string contain one, two, or three pointers, and in which order?)

    To summarize: many aspects of a compiler's ABI, especially pertaining to C, are dictated by the hardware architecture. Different C compilers for the same hardware ought to produce compatible binary code as long as certain aspects like function calling conventions are communicated properly. However, for higher-level languages all bets are off, and whether two different compilers can produce interoperable code has to be decided on a case-by-case basis.