different between C struct bitfields on char and on int

When using bitfields in C, I found out differences I did not expect related to the actual type that is used to declare the fields.

I didn't find any clear explanation. Now, the problem is identified, so if though there is no clear response, this post may be useful to anyone facing the same issue. Still if some can point to a formal explanation, this coudl be great.

The following structure, takes 2 bytes in memory.

struct {
  char field0 : 1; // 1 bit  - bit 0 
  char field1 : 2; // 2 bits - bits 2 down to 1
  char field2 ;    // 8 bits - bits 15 down to 8
} reg0;

This one takes 4 bytes in memory, the question is why ?

struct {
  int  field0 : 1; // 1 bit  - bit 0 
  int  field1 : 2; // 2 bits - bits 2 down to 1
  char field2 ;    // 8 bits - bits 15 down to 8
} reg1;

In both cases, the bits are organized in memory in the same way: field 2 is always taking bits 15 down to 8.

I tried to find some literarure on the subject, but still can't get a clear explanation.

The two most appropriate links I can found are:

However, none really explains really why the second structure is taking 4 bytes. Actually reading carefully, I would even expect the structure to take 2 bytes.

In both cases,

field0 takes 1 bit
field1 takes 2 bits
field2 takes 8 bits, and is aligned on the first available byte address

Hence, the useful data requires 2 bytes in both cases.

So what is behind the scene that makes reg1 to take 4 bytes ?

Full Code Example:

#include "stdio.h"
// Register Structure using char
typedef struct {
    // Reg0
    struct _reg0_bitfieldsA {
      char field0 : 1;
      char field1 : 2;
      char field2 ;
    } reg0;

    // Nextreg
    char NextReg;

} regfileA_t;

// Register Structure using int
typedef struct {
    // Reg1
    struct  _reg1_bitfieldsB {
      int field0 : 1;
      int field1 : 2;
      char field2 ;
    } reg1;

    // Reg
    char NextReg;
} regfileB_t;


regfileA_t regsA;
regfileB_t regsB;


int main(int argc, char const *argv[])
{
    int* ptrA, *ptrB;

    printf("sizeof(regsA) == %-0d\n",sizeof(regsA));   // prints 3 - as expected
    printf("sizeof(regsB) == %-0d\n",sizeof(regsB));   // prints 8 - why ?
    printf("\n");
    printf("sizeof(regsA.reg0) == %-0d\n",sizeof(regsA.reg0)); // prints 2 - as epxected
    printf("sizeof(regsB.reg0) == %-0d\n",sizeof(regsB.reg1)); // prints 4 - int bit fields tells the struct to use 4 bytes then.
    printf("\n");
    printf("addrof(regsA.reg0) == 0x%08x\n",(int)(&regsA.reg0));     // 0x0804A028
    printf("addrof(regsA.reg1) == 0x%08x\n",(int)(&regsA.NextReg));  // 0x0804A02A = prev + 2
    printf("addrof(regsB.reg0) == 0x%08x\n",(int)(&regsB.reg1));     // 0x0804A020
    printf("addrof(regsB.reg1) == 0x%08x\n",(int)(&regsB.NextReg));  // 0x0804A024 = prev + 4 - my register is not at the righ place then.
    printf("\n");

    regsA.reg0.field0 = 1;
    regsA.reg0.field1 = 3;
    regsA.reg0.field2 = 0xAB;

    regsB.reg1.field0 = 1;
    regsB.reg1.field1 = 3;
    regsB.reg1.field2 = 0xAB;

    ptrA = (int*)&regsA;
    ptrB = (int*)&regsB;
    printf("regsA.reg0.value == 0x%08x\n",(int)(*ptrA)); // 0x0000AB07 (expected)
    printf("regsB.reg0.value == 0x%08x\n",(int)(*ptrB)); // 0x0000AB07 (expected)

    return 0;
}

When I first write the struct I expected to get reg1 to take only 2 bytes, hence the next register was at the offset = 2.

Solution

The relevant part of the standard is C11/C17 6.7.2.1p11:

An implementation may allocate any addressable storage unit large enough to hold a bit-field. If enough space remains, a bit-field that immediately follows another bit-field in a structure shall be packed into adjacent bits of the same unit. If insufficient space remains, whether a bit-field that does not fit is put into the next unit or overlaps adjacent units is implementation-defined. The order of allocation of bit-fields within a unit (high-order to low-order or low-order to high-order) is implementation-defined. The alignment of the addressable storage unit is unspecified.

that, in connection with C11/C17 6.7.2.1p5

A. bit-field shall have a type that is a qualified or unqualified version of _Bool, signed int, unsigned int, or some other implementation-defined type. It is implementation-defined whether atomic types are permitted.

and that you're using char means that there is nothing to discuss in general - for a specific implementation check the compiler manuals. Here's the one for GCC.

From the 2 excerpts it follows that an implementation is free to use absolutely whatever types it wants to to implement the bitfields - it could even use int64_t for both of these cases having the structure of size 16 bytes. The only thing a conforming implementation must do is to pack the bits within the chosen addressable storage unit if enough space remains.

For GCC on System-V ABI on 386-compatible (32-bit processors), the following stands:

Plain bit-fields (that is, those neither signed nor unsigned) always have non- negative values. Although they may have type char, short, int, long, (which can have negative values), these bit-fields have the same range as a bit-field of the same size with the corresponding unsigned type. Bit-fields obey the same size and alignment rules as other structure and union members, with the following additions:

Bit-fields are allocated from right to left (least to most significant).

A bit-field must entirely reside in a storage unit appropriate for its declared type. Thus a bit-field never crosses its unit boundary.

Bit-fields may share a storage unit with other struct/union members, including members that are not bit-fields. Of course, struct members occupy different parts of the storage unit.

Unnamed bit-fields' types do not affect the alignment of a structure or union, although individual bit-fields' member offsets obey the alignment constraints.

i.e. in System-V ABI, 386, int f: 1 says that the bit-field f must be within an int. If entire bytes of space remains, a following char within the same struct will be packed inside this int, even if it is not a bit-field.

Using this knowledge, the layout for

struct {
  int  a : 1; // 1 bit  - bit 0 
  int  b : 2; // 2 bits - bits 2 down to 1
  char c ;    // 8 bits - bits 15 down to 8
} reg1;

will be

                     1                     2                 3  
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 
|a b b x x x x x|c c c c c c c c|x x x x x x x x|x x x x x x x x|               

<------------------------------ int ---------------------------->

and the layout for

struct {
  char  a : 1; // 1 bit  - bit 0 
  char b : 2; // 2 bits - bits 2 down to 1
  char c ;    // 8 bits - bits 15 down to 8
} reg1;

will be

                    1
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 
|a b b x x x x x|c c c c c c c c|

<---- char ----><---- char ---->

So there are tricky edge cases. Compare the 2 definitions here:

struct x {
    short a : 2;  
    short b : 15;
    char  c ; 
};

struct y {
    int a : 2;  
    int b : 15;
    char  c ; 
};

Because the bit-field must not cross the unit boundary, the struct x members a and b need to go to different shorts. Then there is not enough space to accommodate the char c, so it must come after that. And the entire struct must be suitably aligned for short so it will be 6 bytes on i386. The latter however, will pack a and b in the 17 lowest bits of the int, and since there is still one entire addressable byte left within the int, the c will be packed here too, and hence sizeof (struct y) will be 4.

Finally, you must really specify whether the int or char is signed or not - the default might be not what you expect! Standard leaves it up to the implementation, and GCC has a compile-time switch to change them.