c assembly arm embedded memory-alignment

Arm Cortex-M4 LDRD instruction causing hardfault

I notice that in the Cortex-M3 that the LDRD (load double word) is listed in the errata, but I'm not finding similar for Cortex-M4, and at any rate, there does not appear to be an interrupt occurring during execution. I'm working with an M4 microcontroller, and passing data to/from a host. It's handy to work with the data in the shape the host (same architecture) deals with it - for example, if the host passes an unsigned 16-bit integer, I accept it as a uint16_t, even though it is in two byte array data_in:

uint16_t some_data = *(uint16_t *)data_in;

When I try to do this with an unsigned 64-bit integer, however, I get a hardfault on the generated LDRD instruction:

uint64_t some_data = *(uint64_t *)data_in;

generates:

9B01        ldr r3, [sp, #4]
330C        adds r3, #12
E9D32300    ldrd r2, r3, [r3, #0]
4902        ldr r1, =SOME_ADDR <some_data>
E9C12306    strd r2, r3, [r1, #24]

and I hardfault on E9D32300 ldrd r2, r3, [r3, #0].

So the question is, other than possible portability issues (not a problem here) am I doing something fundamentally wrong by pointing to the location of a uint64_t and trying to read it as a uint64_t? Either way, has anyone seen an erratum for this instruction reported anywhere? I'm not finding it in the official docs.

Also, just for completeness, this much less fun code works fine:

uint64_t some_data = ((uint64_t)data_in[7] << 8*7) |
                     ((uint64_t)data_in[6] << 8*6) |
                     ((uint64_t)data_in[5] << 8*5) |
                     ((uint64_t)data_in[4] << 8*4) |
                     ((uint64_t)data_in[3] << 8*3) |
                     ((uint64_t)data_in[2] << 8*2) |
                     ((uint64_t)data_in[1] << 8*1) |
                     ((uint64_t)data_in[0] << 8*0);

Solution

In the ARMv7M architecture reference manual, section A3.2.1 "Alignment behavior" it says:

The following data accesses always generate an alignment fault:

Non halfword-aligned LDREXH and STREXH .
Non word-aligned LDREX and STREX .
Non word-aligned LDRD , LDMIA , LDMDB , POP , LDC , VLDR , VLDM , and VPOP .
Non word-aligned STRD , STMIA , STMDB , PUSH , STC , VSTR , VSTM , and VPUSH .

So unless you know that data_in is at-least 32-bit aligned, you can't cast it to (uint64_t*) and expect it to work.