Before you mark this as duplicate, please do read the question.
So this may be a potentially very stupid question but it is bothering me. I know, from reading, as well as many other SO questions that fields in a struct in C are not guaranteed to be contiguous due to padding added by the compiler. For example, according to the C standard:
13/ Within a structure object, the non-bit-field members and the units in which bit-fields reside have addresses that increase in the order in which they are declared. A pointer to a structure object, suitably converted, points to its initial member (or if that member is a bit-field, then to the unit in which it resides), and vice versa. There may be unnamed padding within a structure object, but not at its beginning.
I was working on writing a program similar to the unix readelf
and nm
just for fun and it involves a lot of work with dealing with bytes at specific offsets into the file to read certain values. For example, the first 62 bytes of an object file contains the "file header". The file header's bytes 0x00-0x04 encode an int, while 0x20-0x28 encode a pointer etc. However, I noticed in the original implementation of readelf.c that the programmer does something like this:
First, they declare a struct (lets call it ELF_H) with fields corresponding to the things in the file header (i.e. the first field is an int just like the first 4 bytes in the file header are, the second is a char because bytes 0x04-0x05 in the elf header encode a char etc.). Then what they do is copy the entire elf file to memory and type case the pointer that points to the start of this memory into type ELF_H. Something like:
FILE *file = fopen('filename', rb);
void *start_of_file = malloc(/* size_of_file */);
fread(start_of_file, 1, /* size_of_file */,file); // copies entire file into memory
ELF_H hdr = *(ELF_H) start_of_file; // type case pointer to be of type struct and dereference
and after doing this, just access each section of the header by using the member variables of the struct. So instead of getting what is supposed to be at byte 0x04 using pointer arithmetic, they just do hdr.member2 (which in the struct is the second member followed by the first one which was an int).
How is this meant to work if fields in a struct aren't guaranteed to be contiguous?
The closest answer I could find to this was here but in that example, the members of the struct are of the same type. In the ELF_H, they are of different types.
Thank you in advance :)
How is this meant to work if fields in a struct aren't guaranteed to be contiguous?
The standard doesn't require structs to be contiguous, but this doesn't mean that structs are laid out at random or in unpredictable ways. The specific compiler and linker being used will always generate the binary in a specified way, as dictated by the Application Binary Interface or ABI. It just so happens that on a GNU/Linux machine, the ELF ABI exactly corresponds to how GCC will lay out and access that struct.
In other words, you can predict whether the method you describe will work for any given ABI / compiler / linker combination. It's not guaranteed to work by the standard, but it might be guaranteed to work by the compatibility of ABIs.