Search code examples
cstructbinary-compatibility

Binary compatibility of struct in separately compiled code


Given a CPU architecture, is the exact binary form of a struct determined exactly?

For example, struct stat64 is used by glibc and the Linux kernel. I see glibc define it in sysdeps/unix/sysv/linux/x86/bits/stat.h as:

struct stat64 {
    __dev_t st_dev;      /* Device.  */
# ifdef __x86_64__
    __ino64_t st_ino;    /* File serial number.  */
    __nlink_t st_nlink;  /* Link count.  */
/* ... et cetera ... */
}

My kernel was compiled already. Now when I compile new code using this definition, they have binary compatibility. Where is this guaranteed? The only guarantees I know of are:

  1. The first element has offset 0
  2. Elements declared later have higher offsets

So if the kernel code declares struct stat64 in the exact same way (in the C code), then I know that the binary form has:

  1. st_dev @ offset 0
  2. st_ino @ offset at least sizeof(__dev_t)

But I'm not currently aware of any way to determine the offset of st_ino. Kernighan & Ritchie give the simple example

struct X {
  char c;
  int i;
}

where on my x86-64 machine, offsetof(struct X, i) == 4. Perhaps there are some general alignment rules that determine the exact binary form of a struct for each CPU architecture?


Solution

  • Given a CPU architecture, is the exact binary form of a struct determined exactly?

    No, the representation or layout (“binary form”) of a structure is ultimately determined by the C implementation, not by the CPU architecture. Most C implementations intended for normal purposes follow recommendations provided by the manufacturer and/or the operating system. However, there may be circumstances where, for example, a certain alignment for a particular type might give slightly better performance but is not required, and so one C implementation might choose to require that alignment while another does not, and this can result in different structure layout.

    In addition, a C implementation might be designed for special purposes, such as providing compatibility with legacy code, in which case it might choose to replicate the alignment of some old compiler for another architecture rather than to use the alignment required by the target processor.

    However, let’s consider structures in separate compilations using one C implementation. Then C 2018 6.2.7 1 says:

    … Moreover, two structure, union, or enumerated types declared in separate translation units are compatible if their tags and members satisfy the following requirements: If one is declared with a tag, the other shall be declared with the same tag. If both are completed anywhere within their respective translation units, then the following additional requirements apply: there shall be a one-to-one correspondence between their members such that each pair of corresponding members are declared with compatible types; if one member of the pair is declared with an alignment specifier, the other is declared with an equivalent alignment specifier; and if one member of the pair is declared with a name, the other is declared with the same name. For two structures, corresponding members shall be declared in the same order. For two structures or unions, corresponding bit-fields shall have the same widths…

    Therefore, if two structures are declared identically in separate translation units, or with the minor variations permitted in that passage, then they are compatible, which effectively means they have the same layout or representation.

    Technically, that passage applies only to separate translation units of the same program. The C standard defines behaviors for one program; it does not explicitly define interactions between programs (or fragments of programs, such as kernel extensions) and the operating system, although to some extent you might consider the operating system and everything running in it as one program. However, for practical purposes, it applies to everything compiled with that C implementation.

    This means that as long as you use the same C implementation as the kernel is compiled with, identically declared structures will have the same representation.

    Another consideration is that we might use different compilers for compiling the kernel and compiling programs. The kernel might be compiled with Clang while a user prefers to use GCC. In this case, it is a matter for the compilers to document their behaviors. The C standard does not guarantee compatibility, but the compilers can, if they choose to, perhaps by both documenting that they adhere to a particular Application Binary Interface (ABI).

    Also note that a “C implementation” as discussed above is not just a particular compiler but a particular compiler with particular switches. Various switches may change how a compiler behaves in ways that cause to be effectively a different C implementation, such as switches to conform to one version of the C standard or another, switches affecting whether structures are packed, switches affecting sizes of integer types, and so on.