Search code examples
cpointerscastingprintfmemset

Why does casting the pointer change the value at the address?


So I was doing an exercise to see if I was using memset correctly.

Here's the original code I wrote which was supposed to memset some addressese to have value 50:

int main(){
    int *block1 = malloc(2048);
    memset(block1, 50, 10);
    // int count = 0;
    for (int *iter = block1; (uint8_t *) iter < (uint8_t *)block1 + 10; iter = (int *) ((uint8_t *)iter + 1) ){
        printf("%p : %d\n", iter, *iter);
    }
    return 0;
}

I expected every address in memory to store the value 50. HOWEVER my output was:

(Address : Value)

0x14e008800 : 842150450
0x14e008801 : 842150450
0x14e008802 : 842150450
0x14e008803 : 842150450
0x14e008804 : 842150450
0x14e008805 : 842150450
0x14e008806 : 842150450
0x14e008807 : 3289650
0x14e008808 : 12850
0x14e008809 : 50

I was stuck on the problem for a while and tried a bunch of things until I randomly decided that maybe my pointer is the problem. I then tried a uint8_t pointer.

int main(){
    uint8_t *block1 = malloc(2048);
    memset(block1, 50, 10);
    for (uint8_t  *iter = block1; iter < block1 + 10; iter++ ){
        printf("%p : %d\n", iter, *iter);
    }
    return 0;
}

All I did was change the type of the block1 variable and my iter variable to be uint8_t pointers instead of int pointers and I got the correct result!

0x13d808800 : 50
0x13d808801 : 50
0x13d808802 : 50
0x13d808803 : 50
0x13d808804 : 50
0x13d808805 : 50
0x13d808806 : 50
0x13d808807 : 50
0x13d808808 : 50
0x13d808809 : 50

My question is then, why did that make such a difference?


Solution

  • My question is then, why did that make such a difference?

    Because the exact type of a pointer is hugely important. Pointers in C are not just memory addresses. Pointers are memory addresses, along with a notion of what type of data is expected to be found at that address.

    If you write

    uint8_t *p;
    ... p = somewhere ...
    printf("%d\n", *p);
    

    then in that last line, *p fetches one byte of memory pointed to by p.

    But if you write

    int *p;
    ... p = somewhere ...
    printf("%d\n", *p);
    

    where, yes, the only change is the type of the pointer, then in that exact same last line, *p now fetches four bytes of memory pointed to by p, interpreting them as a 32-bit int. (This assumes int on your machine is four bytes, which is pretty common these days.)

    When you called

    memset(block1, 50, 10);
    

    you were asking for some (though not all) of the individual bytes of memory in block1 to be set to 50.

    When you used an int pointer to step over that block of memory, fetching (as we said earlier) four bytes of memory at a time, you got 4-byte integers where each of the 4 bytes contained the value 50. So the value you got was

    (((((50 << 8) | 50) << 8) | 50) << 8) | 50
    

    which just happens to be exactly 842150450.

    Or, looking at it another way, if you take that value 842150450 and convert it to hex (base 16), you'll find that it's 0x32323232, where 0x32 is the hexadecimal value of 50, again showing that we have four bytes each with the value 50.

    Now, that all makes sense so far, although, you were skating on thin ice in your first program. You had int *iter, but then you said

    for(iter = block1; (uint8_t *) iter < (uint8_t *)block1 + 10; iter = (int *) ((uint8_t *)iter + 1) )
    

    In that cumbersome increment expression

    iter = (int *) ((uint8_t *)iter + 1)
    

    you have contrived to increment the address in iter by just one byte. Normally, we say

    iter = iter + 1
    

    or just

    iter++
    

    and this means to increment the address in iter by several bytes, so that it points at the next int in a conventional array of int.

    Doing it the way you did had three implications:

    1. You were accessing a sort of sliding window of int-sized subblocks of block1. That is, you fetched an int made from bytes 1, 2, 3, and 4, then an int made from bytes 2, 3, 4, and 5, then an int made from bytes 3, 4, 5, and 6, etc. Since all the bytes had the same value, you always got the same value, but this is a strange and generally meaningless thing to do.
    2. Three out of four of the int values you fetched were unaligned. It looks like your processor let you get away with this, but some processors would have given you a Bus Error or some other kind of memory-access exception, because unaligned access aren't always allowed.
    3. You also violated the rule about strict aliasing.