Search code examples
c++memoryuint8t

Does Indirection operator change memory representation?


Ok, I feel stupid asking this, but why does the code below output different lines?

To print the first line I take an address to the first byte of an array, interpret it as a pointer to uint16_t, take the value and print it's bits one by one.

For the second line I take a pointer to the first byte, interpret it as a pointer to uint8_t, take the value and print it's bits one by one. Then do the same with the second byte.

As I don't modify memory allocated for an array, only interpret it in different ways, I expect output be the same, but the order of bytes is different.

I probably miss something, but the only guess I have is that indirection operator does something I don't expect.

#include <iostream>
#include <string.h>


 int main() {
   uint8_t u[2];
   u[0] = 170;
   u[1] = 85;

  for(int i = 15; i >= 0; --i) {
    printf( "%u", (((*((uint16_t*)u)) >> i) & 0x0001));
  }
  printf( "\n");
  for(int i = 7; i >= 0; --i) {
    printf( "%u", (((*((uint8_t*)u)) >> i) & 0x01));
  }
  for(int i = 7; i >= 0; --i) {
    printf( "%u", (((*((uint8_t*)(u + 1))) >> i) & 0x01));
  }
}

Outout

0101010110101010 
1010101001010101

Update #1: Please ignore the allocation, yes the example code doesn't work on every os, but it is just a simplified example.

Update #2: I knew about the endianness, but what I missed is logical vs physical bit representation. In the example above even though physical representation is unchanged, I print logical representation that is affected by endianness. Big Thanks to @john-kugelman for explaining that.


Solution

  • On Intel-based platforms, numbers are stored in little endian order. The least significant byte is first, the most significant last. This is the opposite of how we conventionally read numbers. If we wrote numbers in little endian instead of big endian order, one thousand twenty three would be written 3201 instead of 1023.

    When you interpret the bytes in the byte array as a 16-bit integer, the first byte (170) is interpreted as the least significant byte and the second byte (85) is the most significant. But when you print the bytes yourself, you print them in the opposite order. That's where the mismatch is coming from.

    Endianness is a platform-specific property. Most non-Intel architectures use the more "natural" big endian order. Unfortunately for us, Intel-based architectures are the most common. As it happens, almost all network traffic is big endian, also known as "network byte order". When Intel-based machines talk on the Internet they do a lot of byte swapping during both sending and receiving of data.

    I expected this missmatch to happen if I print that uint16_t itself. What I don't understand is why it happens when I try to get its bits.

    Reading its bits with bit masking and shifting operations doesn't read the physical bits in memory from left-to-right, it reads the logical bits from most-to-least significant. On a little endian architecture, most-to-least significant equates to right-to-left order.

    Also note that endianness means the bytes are swapped, not the bits. Bits aren't swapped in little endian architectures, bytes are. Bits can't be swapped because they're not individually addressable. You can only get at them with shifts and masks.