Search code examples
cstructbit-fieldsendianness

How Are Little Endian Structs With Bitfields and Longwords Stored?


So, I can understand that a word of 0x1234, when stored as little-endian, becomes 0x3412 in memory. I am also seeing that byte 0x12 as a bitfield a:4 and b:4 would be stored as 0x21. But what if I have something more complex? Data like 0x1700581001FFFFFF with the following struct ordering? I'm seeing the data stored as 0x7180051001FFFFFF which is making very little sense to me. It seems 'a' and 'b' got swapped but they remained at the beginning of the struct and g remained at the end along with other seemingly random swaps. Why? Also, I left the "LONGWORD" denotion because that is there in the code. I'm not sure how 4 bits can be a longword, but perhaps that has something to do with this craziness?

LONGWORD a: 4
LONGWORD b: 4
LONGWORD c: 4
LONGWORD d: 12
LONGWORD e: 8
LONGWORD f: 8
LONGWORD g: 24

Solution

  • Upon re-reading the documentation, I do not see any allowance for packing the bitfields in simply any order. There is indeed a specified ordering but it is implementation dependent in which way it is done. But it is still quite determinable. In short, from what I am seeing, it is packing up the bits by groups of 8 IN ORDER. The difference for our Little Endian compiler (or maybe some option somewhere) is that the concatenation of the bits puts the first-defined bits AFTER the next-defined bits (i.e. made the first defined less significant than the next-defined). For example:

    a:3 = 111 (binary)
    b:4 = 0000 (binary)
    c:9 = 011111111 (binary)
    

    Our Little Endian compiler (or, again, perhaps some other option) will take the 3 bits from 'a' and concatenate with b by adding 'a' to the end of 'b'. This, I believe, is opposite what Big Endian compilers would do which would put 'a' before the 'b'. So I'm speculating it's the endianness that does this, but ours would get 7 bits of 0000111 by making ba rather than ab. It then needs one more bit from c to create a full 8. It takes the least significant bit of 'c' which is a 1 and, again, the previous bits get tacked on to the end of that new bit. So we have 10000111. This byte, 0x87, is then stored to memory and it grabs another 8 bits. In this example the next 8 bits is 01111111 and so it stores that byte, 0x7F, after the 0x87. So in memory we now have 0x877F. Another method (likely Big Endian) would have ended up with 0xE0FF. The 0x877F which is now in memory, if interpreted as a word in Little Endian would be a value of 0x7F87 or, in binary, 0111111110000111. This happens to be the exact reverse of the data structure above concatenating 'cba'.

    So let's do that same reverse ordering of the data I provided earlier: (0x1700581001FFFFFF was meant to be parsed as below but I guess that might not have been obvious since it is a Big Endian construct I assumed)

    LONGWORD a: 4 = 0x1
    LONGWORD b: 4 = 0x7
    LONGWORD c: 4 = 0x0
    LONGWORD d: 12 = 0x058
    LONGWORD e: 8 = 0x10
    LONGWORD f: 8 = 0x01
    LONGWORD g: 24 = 0xFFFFFF
    

    With the Little Endian configuration we have, this could be interpreted as one giant structure with a value of 0xFFFFFF0110058071 by concatenating in the order gfedcba. If we store this back to memory in Little Endian format, we would get 0x7180051001FFFFFF which is the exact data I said we were seeing. Big Endian, in theory, would have done it in the order I assumed as obvious (0x1700581001FFFFFF) both as interpreted and stored.

    Well, it makes sense to me. Hopefully it makes sense to someone else trying to figure out the same thing! I still don't get why they all say LONGWORD before them though...