Search code examples
cunionsansi-c

Practical way of implementing comparison of unions in c


For some testing purposes, I need to compare two unions to see if they are identical. Is it enough to compare all members one by one?

union Data {
    int i;
    float f;
    char* str;
} Data;

bool isSame(union Data left, union Data right)
{
    return (left.i == right.i) && (left.f == right.f) && (left.str == right.str);
}

My hunch is that it could fail if one the unions has first contained a larger type and then switched to a smaller type. I have seen some suggestions mentioning wrapping the union in a struct (like here: What is the correct way to check equality between instances of a union?) which keeps track of which data type that the union currently is, but I don't see how that would practically be implemented. Would I not need to manually set the union type in every instance where I set the union value?

struct myData
{
    int dataType;
    union {
        ...
    } u;
}

void someFunc()
{
    struct myData my_data_value = {0};
    my_data_value.u.i = 5;
    my_data_value.u.dataType = ENUM_TYPE_INTEGER;

    my_data_value.u.f = 5.34;
    my_data_value.u.dataType = ENUM_TYPE_FLOAT;
    
    ...
}

It does not seem warranted to double all code where my union is involved simply to be able to make a perfect comparison of the union values. Am I missing some obvious smart way to go about this?


Solution

  • If your proposal worked, then you could achieve the same effect without multiple comparisons by using memcmp(&left, &right, sizeof left). But that won't work and neither will your proposal, for the same reason.

    First, assignment to a union member which does not occupy all the bytes allocated to the union has unspecified effect on the unoccupied bytes. The most likely is that they will not be modified from their previous values, but any value is possible. Comparing the values of such bytes has an unspecified result.

    You might think that memsetting the bytes of the union to 0 before assigning a member would allow the comparison to work, but the standard does not require the unused bytes to be unmodified. Moreover, many compilers will optimise away the attempt to clear the union on the grounds that it has no legal effect if the next statement gives the union a new value.

    There are other reasons why trying to compare union members which are not the current value, which apply even if neither value includes padding.

    For example, if you don't know that the two union values currently have the same active member, you might get a false equivalence. (Every float has the same bit pattern as some int but the two values are certainly not the same.)

    Less obviously, it's possible for two values with different bit patterns to actually be equal. (Floating point 0.0 and -0.0 are considered equal, for example.)

    Finally, not every bit pattern is a valid float; if one or both of the union values is an int whose bit pattern corresponds to a floating NaN, trying to compare the values as floats will certainly produce the wrong answer (a NaN is not equal to itself) and may throw a floating point exception.

    In short, if you don't know which type is active for a union, you cannot usefully use the union value, other than to assign it to another object of the same union type. That means that there must be some mechanism, internal or external, which identifies the active type of the union.

    The choice between external mechanisms (used, for example, in yacc-generated parsers) and internal mechanisms (so-called "discriminated unions", as you suggest at the end of your question) will depend on the precise application environment.