Search code examples
pointersmemorytagging

Why tagged pointers exists


I understand theory that is behind tagged pointers and how it is used to save additional data in pointer.
But i dont understand this part (from wikipedia article about tagged pointers).

Most architectures are byte-addressable (the smallest addressable unit is a byte), but certain types of data will often be aligned to the size of the data, often a word or multiple thereof. This discrepancy leaves a few of the least significant bits of the pointer unused

Why is this happening ?
Does pointer have only 30 bites (on 32 bit architectures) and that 2 bites are result of aligning?
Why there are 2 bites left unused in first place ?
And does this decrease size of addresable space (from 2^32 bytes to 2^30 bytes)?


Solution

  • Consider an architecture that uses 16-bit alignment and also 16-bit pointers (just to avoid having too many binary digits!). A pointer will only ever be referring to memory locations that are multiples of 16, but the pointer value is still precise down to the byte. So a pointer that, in binary, is:

    0000000000000100

    refers to the memory location 4 (the fifth byte in memory):

    ┌────────────────────┬───────────────────┬─────────────┬──────────────┐
    │ Address in Decimal │ Address in Binary │ 8─bit bytes │ 16─bit words │
    ├────────────────────┼───────────────────┼─────────────┼──────────────┤
    │ 0                  │ 0000000000000000  │ ┌─────────┐ │ ┌──────────┐ │
    │                    │                   │ └─────────┘ │ │          │ │
    │ 1                  │ 0000000000000001  │ ┌─────────┐ │ │          │ │
    │                    │                   │ └─────────┘ │ └──────────┘ │
    │ 2                  │ 0000000000000010  │ ┌─────────┐ │ ┌──────────┐ │
    │                    │                   │ └─────────┘ │ │          │ │
    │ 3                  │ 0000000000000011  │ ┌─────────┐ │ │          │ │
    │                    │                   │ └─────────┘ │ └──────────┘ │
    │ 4 - this one       │ 0000000000000100  │ ┌─────────┐ │ ┌──────────┐ │
    │                    │                   │ └─────────┘ │ │          │ │
    │ 5                  │ 0000000000000101  │ ┌─────────┐ │ │          │ │
    │                    │                   │ └─────────┘ │ └──────────┘ │
    │ 6                  │ 0000000000000110  │ ┌─────────┐ │ ┌──────────┐ │
    │                    │                   │ └─────────┘ │ │          │ │
    │ 7                  │ 0000000000000111  │ ┌─────────┐ │ │          │ │
    │                    │                   │ └─────────┘ │ └──────────┘ │
    │ ...                │                   │             │              │
    └────────────────────┴───────────────────┴─────────────┴──────────────┘
    

    With 16-bit alignment, there will never be a pointer referring to memory location 5 because it wouldn't be aligned, the next useful value is 6:

    0000000000000110

    Note that the least significant bit (the one on the far right) is still 0. In fact, for all valid pointer values on that architecture, that bit will be 0. That's what they mean by leaving "...a few of the least significant bits of the pointer unused." In my example it's just one bit, but if you had 32-bit alignment, it would be two bits at the end of pointer value that would always be zero:

    ┌────────────────────┬───────────────────┬─────────────┬──────────────┐
    │ Address in Decimal │ Address in Binary │ 8─bit bytes │ 32─bit words │
    ├────────────────────┼───────────────────┼─────────────┼──────────────┤
    │  0                 │ 0000000000000000  │ ┌─────────┐ │ ┌──────────┐ │
    │                    │                   │ └─────────┘ │ │          │ │
    │  1                 │ 0000000000000001  │ ┌─────────┐ │ │          │ │
    │                    │                   │ └─────────┘ │ │          │ │
    │  2                 │ 0000000000000010  │ ┌─────────┐ │ │          │ │
    │                    │                   │ └─────────┘ │ │          │ │
    │  3                 │ 0000000000000011  │ ┌─────────┐ │ │          │ │
    │                    │                   │ └─────────┘ │ └──────────┘ │
    │  4                 │ 0000000000000100  │ ┌─────────┐ │ ┌──────────┐ │
    │                    │                   │ └─────────┘ │ │          │ │
    │  5                 │ 0000000000000101  │ ┌─────────┐ │ │          │ │
    │                    │                   │ └─────────┘ │ │          │ │
    │  6                 │ 0000000000000110  │ ┌─────────┐ │ │          │ │
    │                    │                   │ └─────────┘ │ │          │ │
    │  7                 │ 0000000000000111  │ ┌─────────┐ │ │          │ │
    │                    │                   │ └─────────┘ │ └──────────┘ │
    │  8                 │ 0000000000001000  │ ┌─────────┐ │ ┌──────────┐ │
    │                    │                   │ └─────────┘ │ │          │ │
    │  9                 │ 0000000000001001  │ ┌─────────┐ │ │          │ │
    │                    │                   │ └─────────┘ │ │          │ │
    │ 10                 │ 0000000000001010  │ ┌─────────┐ │ │          │ │
    │                    │                   │ └─────────┘ │ │          │ │
    │ 11                 │ 0000000000001011  │ ┌─────────┐ │ │          │ │
    │                    │                   │ └─────────┘ │ └──────────┘ │
    │ ...                │                   │             │              │
    └────────────────────┴───────────────────┴─────────────┴──────────────┘