Search code examples
c++clang

Clang 15+ with -O3 optimisation causing infinite loop with casting packed structs


I've been trying to resolve an issue I've come across when upgrading from Clang 14, where it appears that the compiler seems to be declaring that a pointer is out of scope, which causes the original data to be deallocated/overwritten.

#include <cstdio>
#include <cstdint>

struct GenericInterface
{
    uint8_t size;
} __attribute__((packed));

struct SpecificInterface
{
    uint8_t size;
    uint8_t setting;
} __attribute__((packed));

int main()
{
    uint8_t setting = 5;
    uint8_t size = (uint8_t)sizeof(SpecificInterface);
    SpecificInterface specific { size, setting };

    GenericInterface* generic = (GenericInterface*)&specific;
    GenericInterface* genericEnd = (GenericInterface*)((uint8_t*)generic + size);

    printf("Starts off %u\n", generic->size);
    while (generic < genericEnd)
    {
        printf("Checking %u\n", generic->size);
        if (generic->size == size)
            break;
        generic = (GenericInterface*)((uint8_t*)generic + generic->size);
    }

    printf("Done\n");

    return 0;
}

In this scenario here, I have a generic packed struct, which only contains a size variable, and potentially many different structs which also have a size variable as their first variable. The purpose of this is to that these other structs (like SpecificInterface) can be casted to GenericInterface for the purpose of iterating through multiple structs which are packed together in memory.

In this specific example, the while loop should never loop and should break instantly, however, I've noticed when compiling with Clang 15+ with the -O3 flag, it gets stuck in an infinite loop. Some versions, its not an infinite loop, but the printed value seems to be corrupted. If you make the generic variable volatile, it works. If you use Clang 14, it works fine, doesn't loop. If you use Clang 15+ without the -O3 tag, it works fine. If you use gcc, its fine. If you remove the loop, its fine. If you modify some lines, for example remove the line generic = (GenericInterface*)((uint8_t*)generic + generic->size);, its fine.

It seems that there might be an optimisation issue going on here which was introduced in clang 15? Am I doing something illegal and suffering the consequences of it? I know there are less problematic ways to do this, but I'm purely just interested in knowing what's going on here, and not really looking for alternative ways to do this.


Solution

  • You are definitely violating the strict aliasing rule, you can even solve the issue by adding -fno-strict-aliasing to the compiler flags but performance will degrade and this is still UB, you need use some union type to tell the compiler to expect to alias any of these types.

    #include <cstdio>
    #include <cstdint>
    
    struct GenericInterface
    {
        uint8_t size;
    } __attribute__((packed));
    
    struct SpecificInterface
    {
        uint8_t size;
        uint8_t setting;
    } __attribute__((packed));
    
    int main()
    {
        struct alias_union{
            union {GenericInterface g; SpecificInterface s; };
        };
    
        uint8_t setting = 5;
        uint8_t size = (uint8_t)sizeof(SpecificInterface);
        SpecificInterface specific { size, setting };
    
        alias_union* generic = (alias_union*)&specific;
        alias_union* genericEnd = (alias_union*)((uint8_t*)generic + size);
    
        printf("Starts off %u\n", generic->g.size);
        while (generic < genericEnd)
        {
            printf("Checking %u\n", generic->g.size);
            if (generic->g.size == size)
                break;
            generic = (alias_union*)((uint8_t*)generic + generic->g.size);
        }
    
        printf("Done\n");
    
        return 0;
    }
    

    godbolt demo

    I am not entirely convinced that this is 100% defined behavior, but it makes it harder for the compiler to ignore the aliasing possibility, and i would recommend you use generic programming over this.