Search code examples
carmarm64bare-metal

Why doesn't my ARM LDREX/STREX C function work?


I wrote a claim_lock function in C, according to the "Barrier Litmus Tests and Cookbook" document. I examined the generated code, and it all looks good, but it didn't work.

// This code conforms to the section 7.2 of PRD03-GENC-007826:
// "Acquiring and Releasing a Lock"
static inline void claim_lock( uint32_t volatile *lock )
{
  uint32_t failed = 1;
  uint32_t value;

  while (failed) {
    asm volatile ( "ldrex %[value], [%[lock]]"
                   : [value] "=&r" (value)
                   : [lock] "r" (lock) );
    if (value == 0) {
      // The failed and lock registers are not allowed to be the same, so
      // pretend to gcc that the lock pointer may be written as well as read.

      asm volatile ( "strex %[failed], %[value], [%[lock]]"
                     : [failed] "=&r" (failed)
                     , [lock] "+r" (lock)
                     : [value] "r" (1) );
    }
    else {
      asm ( "clrex" );
    }
  }
  asm ( "dmb sy" );
}

Generated code (gcc):

1000:       e3a03001        mov     r3, #1
1004:       e1902f9f        ldrex   r2, [r0]
1008:       e3520000        cmp     r2, #0
100c:       1a000004        bne     1024 <claim_lock+0x24>
1010:       e1802f93        strex   r2, r3, [r0]
1014:       e3520000        cmp     r2, #0
1018:       1afffff9        bne     1004 <claim_lock+0x4>
101c:       f57ff05f        dmb     sy
1020:       e12fff1e        bx      lr
1024:       f57ff01f        clrex
1028:       eafffff5        b       1004 <claim_lock+0x4>

Corresponding release function:

static inline void release_lock( uint32_t volatile *lock )
{
  // Ensure that any changes made while holding the lock are
  // visible before the lock is seen to have been released
  asm ( "dmb sy" );
  *lock = 0;
}

It worked in QEMU, but either hung, or allowed all cores to "claim" the so-called "lock" on real hardware (Raspberry Pi 3 Cortex-A53).


Solution

  • The LDREX instruction will hang the core (unless my test failed to report an exception) if:

    • The MMU is not enabled
    • The virtual memory area containing the lock is not cached

    The cores will appear to ignore each other's claims if:

    • Symmetric Multi-processing has not been enabled

    The SMP enable mechanism seems to vary from device to device; check the TRM for the partular core, it's outside the scope of the ARM ARM.

    For the Cortex-A53, the bit to set is SMPEN, bit 6 of The CPU Extended Control Register, CPUECTLR.

    Earlier devices have bit 5 of the Auxiliary Control Register, for example (ARM11 MPcore), where there's also the SCU to consider. I don't have such a device, but it's that documentation where I first noticed an SMP/nAMP bit.