Search code examples
cmemorylanguage-lawyerendianness

Representation of int in memory


On architectures where int is represented using multiple bytes in memory, what constraints does the C Standard impose regarding possible representations? Most current systems use either little-endian or big-endian representations, but it is possible to have a conforming system with a different representation? How different can it be?


Solution

  • what constraints does the C Standard impose regarding possible representations?

    3 Encodings allowed: 2's complement, 1s' complement, sign-magnitude. Non-2's complement could have either a -0 or a trap representation.

    int must be 16-bit or wider (a range of at least [-32767...32767]). Could be 36 or 64 for real historic examples.

    but it is possible to have a conforming system with a different representation?

    Sample: PDP-endian

    0x01020304 stored as 2, 1, 4, 3. See also @chqrlie.

    How different can it be?

    int may have padding, char cannot. I do not know of any int with padding.

    int could be 1 "byte" when a "byte" is more than 16 bits.
    IIRC, some graphics processors used 64-bit "byte", char, int, long, long long.

    I once did used a 64-bit long, unsigned long where the unsigned long had 1 padding bit such that ULONG_MAX == LONG_MAX. Compliant but unusual. In theory, UINT_MAX == INT_MAX is possible - never heard of such an implementation.

    In 2020, I suspect the follow are universal.

    • Endian: either big or little.

    • 2's complement. (Next C might require this.)

    • "byte size" of 8 (maybe 16, 32), int is 16 or 32 bit.

    • No padding.