Search code examples
linuxgccarmundefined-behaviormemory-alignment

Function for unaligned memory access on ARM


I am working on a project where data is read from memory. Some of this data are integers, and there was a problem accessing them at unaligned addresses. My idea would be to use memcpy for that, i.e.

uint32_t readU32(const void* ptr)
{
    uint32_t n;
    memcpy(&n, ptr, sizeof(n));
    return n;
}

The solution from the project source I found is similar to this code:

uint32_t readU32(const uint32_t* ptr)
{
    union {
        uint32_t n;
        char data[4];
    } tmp;
    const char* cp=(const char*)ptr;
    tmp.data[0] = *cp++;
    tmp.data[1] = *cp++;
    tmp.data[2] = *cp++;
    tmp.data[3] = *cp;
    return tmp.n;
}

So my questions:

  1. Isn't the second version undefined behaviour? The C standard says in 6.2.3.2 Pointers, at 7:

A pointer to an object or incomplete type may be converted to a pointer to a different object or incomplete type. If the resulting pointer is not correctly aligned 57) for the pointed-to type, the behavior is undefined.

As the calling code has, at some point, used a char* to handle the memory, there must be some conversion from char* to uint32_t*. Isn't the result of that undefined behaviour, then, if the uint32_t* is not corrently aligned? And if it is, there is no point for the function as you could write *(uint32_t*) to fetch the memory. Additionally, I think I read somewhere that the compiler may expect an int* to be aligned correctly and any unaligned int* would mean undefined behaviour as well, so the generated code for this function might make some shortcuts because it may expect the function argument to be aligned properly.

  1. The original code has volatile on the argument and all variables because the memory contents could change (it's a data buffer (no registers) inside a driver). Maybe that's why it does not use memcpy since it won't work on volatile data. But, in which world would that make sense? If the underlying data can change at any time, all bets are off. The data could even change between those byte copy operations. So you would have to have some kind of mutex to synchronize access to this data. But if you have such a synchronization, why would you need volatile?

  2. Is there a canonical/accepted/better solution to this memory access problem? After some searching I come to the conclusion that you need a mutex and do not need volatile and can use memcpy.

P.S.:

# cat /proc/cpuinfo
processor       : 0
model name      : ARMv7 Processor rev 10 (v7l)
BogoMIPS        : 1581.05
Features        : swp half thumb fastmult vfp edsp neon vfpv3 tls
CPU implementer : 0x41
CPU architecture: 7
CPU variant     : 0x2
CPU part        : 0xc09
CPU revision    : 10

Solution

  • This code

    uint32_t readU32(const uint32_t* ptr)
    {
        union {
            uint32_t n;
            char data[4];
        } tmp;
        const char* cp=(const char*)ptr;
        tmp.data[0] = *cp++;
        tmp.data[1] = *cp++;
        tmp.data[2] = *cp++;
        tmp.data[3] = *cp;
        return tmp.n;
    }
    

    passes the pointer as a uint32_t *. If it's not actually a uint32_t, that's UB. The argument should probably be a const void *.

    The use of a const char * in the conversion itself is not undefined behavior. Per 6.3.2.3 Pointers, paragraph 7 of the C Standard (emphasis mine):

    A pointer to an object type may be converted to a pointer to a different object type. If the resulting pointer is not correctly aligned for the referenced type, the behavior is undefined.
    Otherwise, when converted back again, the result shall compare equal to the original pointer. When a pointer to an object is converted to a pointer to a character type, the result points to the lowest addressed byte of the object. Successive increments of the result, up to the size of the object, yield pointers to the remaining bytes of the object.

    The use of volatile with respect to the correct way to access memory/registers directly on your particular hardware would have no canonical/accepted/best solution. Any solution for that would be specific to your system and beyond the scope of standard C.