Search code examples
cdata-structuresunions

Overwriting values of a union provides undefined behavior?


I am trying to understand what is happening with the value being stored in my union memory. I have this code snippet to use as an example:

#include <stdio.h>
#include <string.h>
#include <stdint.h>

int main() {
    
    union myUnion {
    int i;
    float f;
    char c;
    };

    union myUnion u;

    u.i = 42;    // store an integer value in the union
    printf("%d\n", u.i);    // prints 42

    u.f = 3.14;  // store a float value in the union
    printf("%f\n", u.f);   // prints 3.140000

    u.c = 'A';   // store a character value in the union
    printf("%c\n", u.c);   // prints 'A'

    printf("%d\n", u.i);   // prints 1078523201
    printf("%f\n", u.f);  // print 3.139969
}

Everything works fine until this line printf("%d\n", u.i); // prints 1078523201 which is expected because the original int value of 42 should be overwritten in memory. Is this happening because it is printing the int (32 bits long) version of what remains in memory from previously setting the union?

Mem Add | Value

0x1000    ascii value of A from setting u.c (8 bits)
0x1001    remnants from setting u.f 
0x1002    remnants from setting u.f 
0x1003    remnants from setting u.f 

But then if this is true, why does the print of u.f provide an accurate result? Is this because the floating point number was saved in a register and the code is just reusing that as opposed to going out to the memory location of the union?


Solution

  • Is this happening because it is printing the int (32 bits long) version of what remains in memory from previously setting the union?

    Yes. And it's legal to do in C (see endnote)

    For a union the individual members share the same memory.

    So in your case u.i and u.f share the same memory (they are both 32 bits in your case). This means that when you set u.i to some value and then change u.f and afterwards reads u.i you'll see another result than the value you assigned to u.i.

    u.c also share memory with u.i and u.f so when you assign a value to u.c, it will also change u.i and u.f.

    In the end it's about how the bit pattern of an integer is stored and how the bit pattern of a float is stored and how the bit pattern of a char is stored. This include both the bit level encoding and endianess of your system.

    Inside the computer there are bits. Sequences of bits. What they mean depends on what you want them to mean... This is what the type system is doing for you. And unions is a way to look at the same memory (same bit pattern) as different types.

    Here is your program with a little change. The code now dumps the raw memory of the union object.

    #include <stdio.h>
    #include <string.h>
    #include <stdint.h>
    
    union myUnion {
        int i;
        float f;
        char c;
        unsigned uns;
    };
    
    void dump_union(union myUnion* u)
    {
        unsigned char* p = (unsigned char*)u;
        printf("union memory : ");
        for (size_t i = 0; i < sizeof(union myUnion); ++i) printf("%02X ", p[i]);
        printf("\n\n");
    }
    
    int main(void) 
    {    
        union myUnion u;
    
        u.i = 42;    // store an integer value in the union
        printf("%d\n", u.i);    // prints 42
        dump_union(&u);
    
        u.f = 3.14;  // store a float value in the union
        printf("%f\n", u.f);   // prints 3.140000
        dump_union(&u);
    
        u.c = 'A';   // store a character value in the union
        printf("%c\n", u.c);   // prints 'A'
        dump_union(&u);
    
        printf("%d\n", u.i);   // prints 1078523201
        dump_union(&u);
        printf("%f\n", u.f);  // print 3.139969
        dump_union(&u);
        
        return 0;
    }
    

    On my system the output is:

    42
    union memory : 2A 00 00 00 
    
    3.140000
    union memory : C3 F5 48 40 
    
    A
    union memory : 41 F5 48 40 
    
    1078523201
    union memory : 41 F5 48 40 
    
    3.139969
    union memory : 41 F5 48 40 
    

    The first thing to notice is that my system is little endian. That means that the least significant byte of an object is stored at the lowest address.

    So when you see 2A 00 00 00 in the above output, it means that u.i has the representation 0000002A for the integer value 42 decimal.

    Likewise when you see C3 F5 48 40 it means that u.f has the representation 4048F5C3 for the float value 3.14

    Assigning to u.i and u.f overwrites all of the unions memory because they are both 4 bytes on my system.

    Assigning to u.c will however only change the byte at the lowest address and leave the reaming bytes unchanged, i.e. C3 F5 48 40 becomes 41 F5 48 40.

    Due to my system being little endian it means that the bit pattern for u.f changes from 4048F5C3 to 4048F541. Only the least significant byte was changed. Due to the way float is encoded this represents a minor change of the floating point value, i.e. when printing u.f the value changed from 3.14 to 3.139969

    If you print u.i just before and after setting u.c you will see a similar change of the integer value.

    Endnote

    Changing the value of one union member and then reading it using another union member is something you shall do with great care. System endianess may impact the final result.

    Another thing to take care about is trap representations. A valid bit pattern for one type may be a trap representation for another type. So you may end up with unexpected results.