c c11 c89

Is the size of every C data type guaranteed to be an integer multiple of bytes?

I know that sizeof(char) will always be 1, and that this is in units of bytes, and that a byte can be any number of bits (I believe any number of bits greater than or equal to 8, but not positive on that).

I also commonly see references that mention how C data type sizes can be specified in terms of the relationship between their sizes, such as "sizeof(int) <= sizeof(long)".

My question is basically: What would "sizeof(int)" evaluate to on a system where a byte is 8 bits and an int is 39 bits (or some other value which is not evenly divisible by CHAR_BIT).

My guess is that sizeof() returns the minimum number of bytes required to store the type, so it would therefore round up to the next byte. So in my example with a 39 bit int, the result of sizeof(int) would be 5.

Is this correct?

Also, is there an easy way to determine the number of bits a particular type can hold that is 100% portable and does not require the inclusion of any headers? This is more for a learning experience than an actual application. I would just use stdint types in practice. I was thinking maybe something along the lines of declaring the variable and initializing it to ~0, then loop and left shift it until it's zero.

Thanks!

Solution

Chapter and verse:

6.2.6 Representations of types

6.2.6.1 General

The representations of all types are unspecified except as stated in this subclause.
Except for bit-fields, objects are composed of contiguous sequences of one or more bytes, the number, order, and encoding of which are either explicitly specified or implementation-defined.
Values stored in unsigned bit-fields and objects of type unsigned char shall be represented using a pure binary notation.⁴⁹⁾
Values stored in non-bit-field objects of any other object type consist of n × CHAR_BIT bits, where n is the size of an object of that type, in bytes. The value may be copied into an object of type unsigned char [n] (e.g., by memcpy); the resulting set of bytes is called the object representation of the value. Values stored in bit-fields consist of m bits, where m is the size specified for the bit-field. The object representation is the set of m bits the bit-field comprises in the addressable storage unit holding it. Two values (other than NaNs) with the same object representation compare equal, but values that compare equal may have different object representations.

^{49) A positional representation for integers that uses the binary digits 0 and 1, in which the values
represented by successive bits are additive, begin with 1, and are multiplied by successive integral
powers of 2, except perhaps the bit with the highest position. (Adapted from the American National
Dictionary for Information Processing Systems.) A byte contains CHAR_BIT bits, and the values of
type unsigned char range from 0 to 2^CHAR_BIT − 1.}

My question is basically: What would "sizeof(int)" evaluate to on a system where a byte is 8 bits and an int is 39 bits (or some other value which is not evenly divisible by CHAR_BIT).

The implementation would have to map CHAR_BIT-sized storage units onto odd-sized words such that the above requirements hold, probably with a significant performance penalty. A 39-bit word can hold up to four 8- or 9-bit storage units, so sizeof (int) would probably evaluate to 4.

Alternately, the implementor can simply decide it's not worth the hassle and set CHAR_BIT to 39; everything, including individual characters, takes up one or more full words, leaving up to 31 bits unused depending on the type.

There have been real-world examples of this sort of thing in the past. One of the old DEC PDPs (I want to say the PDP-8, maybe PDP-11?) used 36-bit words and 7-bit ASCII for character values; 5 characters could be stored in a single word, with one bit unused. All other types took up a full word. If the implementation set CHAR_BIT to 9, you could cleanly map CHAR_BIT-sized storage units onto 36-bit words, but again, that may incur a significant performance penalty if the hardware expects 5 characters per word.