Search code examples
c++cintegerbit

Why 'int' (and not 'char') is used to represent bits?


I am thinking about learning some bit manipulation.
I know that memory in computers is byte-addressed. [1 byte = 8 bits]
I know that the standard is to use int [4 bytes] and 'twiddle' the bits in it.

But why not a 'char' or a 'u_int8_t'?
These are more closer to one byte size (more modular); and have similar operations to ints. (Although, I guess chars are typecasted to ints by the compiler, piling up extra operations).

Also, if not these, then we can use 'long' data type with 64 bit processor? (My gut says it is possible but not portable to x86).

TL;DR - What is the rationale behind using 'int' as data store for bits (in algorithm design sense).

Edit: Someone pointed that "a data store for bits" is unclear. What I meant is data-type to build a bit-wise data structure like bit array, bit field. (that what I meant for an algorithmic sense).

click here and here. Relevant - link


Solution

  • What is the rationale behind using 'int' as data store for bits

    A long time ago, int represented the "word" of the processor. Let's say int was the datatype that the processor is able to operate the fastest on. That's why on any calculation char is promoted to an int, so if you use char in your code it's going to be int anyway, because the assumption was the processor is going to use the assembly instructions for its word.

    int takes its history from B programming language, which only had one datatype - the processor word. When creating C the prophet Dennis Ritchie added char to represent one byte and int represented the processor word like in B programming language. That's why it was implicit int, so you could write auto var; like in B. See https://www.bell-labs.com/usr/dmr/www/chist.html , https://en.wikipedia.org/wiki/B_(programming_language) .

    Nowadays, you would prefer int_fastX_t, like int_fast8_t or int_fast32_t, depending on how many bits you want to use. Or, the best would be to detect processor width at build time with the build system and compile the library optimized for specific processor word size.

    My gut says it is possible but not portable to x86).

    Using a different datatype is an optimization that results in faster code on a specific architecture, not a portability issue. If you stick to what is guaranteed by C programming language, then the code will work everywhere. long has at least 32 bits - it may have 32-bits on a 64-bit processor, it may be 64 bits, it may have 36 bits, any number higher than 32 bits. It's size solely depends on the compiler and system. But nowadays only 32 and 64 matter, see https://en.wikipedia.org/wiki/64-bit_computing#64-bit_data_models . Use uint64_t if you mean 64-bits.

    Commonly, you would write such code with unsigned int. It will be 32-bit on nowadays processors, and any nowadays processor has all the instructions for 32-bits. And if you will write on xc8 compiler that has 16-bit int, your code may fail. To protect against this, you would use int_fast32_t.