Search code examples
cunions

Function pointer as "tag" in tagged union


One of the reasons why I tend to avoid using tagged unions is that I don't like the idea of the performance penalty that the switch/case statement for the tag might introduce if the number of tags is greater than 4 or so.

I just had the idea that instead of using a tag, one could set a pointer to a function that reads the last written value in the union. For example:

union u{
   char a;
   int b;
   size_t c;
};

struct fntag{
   void (*readval)(union u *ptr, void *out);
   union u val;
};

And then, whenever you write a value into val, you also update the readval pointer accordingly, so that it points to a function that reads the last field you wrote in the union. Yes, there's a difficult issue, and it's where to return the read value (because the same function pointer cannot point to functions returning different types). I chose to return the value through a pointer to void so that such function can also be "overloaded" with C11 _Generic() and thus casting and writing into different types for the output.

Of course, invoking a pointer to a function has a performance overhead, and I guess it's far heavier that checking the value of an enum, but at some point, if the number of tags is big, I believe it would be faster than switch/case.

My question is: Have you ever seen this technique used somewhere? (I haven't, and I don't know if it's because real world application of this would require _Generic() on the readval function, which requires C11, or if it's because of some problem I'm not noticing at this moment). How many tags do you guess you'd need to have for the pointer to function being faster -in current Intel CPUs- than the switch/case?


Solution

  • You could do that. In your case a more optimization friendly function signature would be size_t size_t (*)(union u U) (all the union values can be sort of fitted into size_t and the union is small enough for passing by value to be more efficient), but even with that, function calls have non-negligible overhead that tends to be significantly larger than that of a jump through a switch-generated jump table.

    Try something like:

    #include <stddef.h>
    enum en { e_a, e_b, e_c };
    union u{
       char a;
       int b;
       size_t c;
    };
    
    size_t u_get_a(union u U) { return U.a; }
    size_t u_get_b(union u U) { return U.b; }
    size_t u_get_c(union u U) { return U.c; }
    
    struct fntag{ size_t (*u_get)(union u U); union u u_val; };
    struct entag{ enum en u_type; union u u_val; };
    
    struct fntag fntagged1000[1000]; struct entag entagged1000[1000];
    
    void init(void) {
        for (size_t i=0; i<1000; i++)
            switch(i%3){
            break;case 0: 
                fntagged1000[i].u_val.a = i, fntagged1000[i].u_get = &u_get_a;
                entagged1000[i].u_val.a = i, entagged1000[i].u_type = e_a;
            break;case 1: 
                fntagged1000[i].u_val.b = i, fntagged1000[i].u_get = &u_get_b;
                entagged1000[i].u_val.b = i, entagged1000[i].u_type = e_b;
            break;case 2:
                fntagged1000[i].u_val.c = i, fntagged1000[i].u_get = &u_get_c;
                entagged1000[i].u_val.c = i, entagged1000[i].u_type = e_c;
            }
    }
    
    
    size_t get1000fromEnTagged(void)
    {
        size_t r = 0;
        for(int i=0; i<1000; i++){
            switch(entagged1000[i].u_type){
            break;case e_a: r+=entagged1000[i].u_val.a;
            break;case e_b: r+=entagged1000[i].u_val.b;
            break;case e_c: r+=entagged1000[i].u_val.c;
            /*break;default: __builtin_unreachable();*/
            }
        }
        return r;
    }
    
    size_t get1000fromFnTagged(void)
    {
        size_t r = 0;
        for(int i=0; i<1000; i++) r += (*fntagged1000[i].u_get)(fntagged1000[i].u_val);
        return r;
    }
    
    
    
    int main(int C, char **V)
    {
        size_t volatile r;
        init();
        if(!V[1]) for (int i=0; i<1000000; i++) r=get1000fromEnTagged();
        else for (int i=0; i<1000000; i++) r=get1000fromFnTagged();
    
    
    }
    

    At -O2, I'm getting more then twice the performance in the switch-based code.