I wrote a claim_lock function in C, according to the "Barrier Litmus Tests and Cookbook" document. I examined the generated code, and it all looks good, but it didn't work.
// This code conforms to the section 7.2 of PRD03-GENC-007826:
// "Acquiring and Releasing a Lock"
static inline void claim_lock( uint32_t volatile *lock )
{
uint32_t failed = 1;
uint32_t value;
while (failed) {
asm volatile ( "ldrex %[value], [%[lock]]"
: [value] "=&r" (value)
: [lock] "r" (lock) );
if (value == 0) {
// The failed and lock registers are not allowed to be the same, so
// pretend to gcc that the lock pointer may be written as well as read.
asm volatile ( "strex %[failed], %[value], [%[lock]]"
: [failed] "=&r" (failed)
, [lock] "+r" (lock)
: [value] "r" (1) );
}
else {
asm ( "clrex" );
}
}
asm ( "dmb sy" );
}
Generated code (gcc):
1000: e3a03001 mov r3, #1
1004: e1902f9f ldrex r2, [r0]
1008: e3520000 cmp r2, #0
100c: 1a000004 bne 1024 <claim_lock+0x24>
1010: e1802f93 strex r2, r3, [r0]
1014: e3520000 cmp r2, #0
1018: 1afffff9 bne 1004 <claim_lock+0x4>
101c: f57ff05f dmb sy
1020: e12fff1e bx lr
1024: f57ff01f clrex
1028: eafffff5 b 1004 <claim_lock+0x4>
Corresponding release function:
static inline void release_lock( uint32_t volatile *lock )
{
// Ensure that any changes made while holding the lock are
// visible before the lock is seen to have been released
asm ( "dmb sy" );
*lock = 0;
}
It worked in QEMU, but either hung, or allowed all cores to "claim" the so-called "lock" on real hardware (Raspberry Pi 3 Cortex-A53).
The LDREX instruction will hang the core (unless my test failed to report an exception) if:
The cores will appear to ignore each other's claims if:
The SMP enable mechanism seems to vary from device to device; check the TRM for the partular core, it's outside the scope of the ARM ARM.
For the Cortex-A53, the bit to set is SMPEN, bit 6 of The CPU Extended Control Register, CPUECTLR.
Earlier devices have bit 5 of the Auxiliary Control Register, for example (ARM11 MPcore), where there's also the SCU to consider. I don't have such a device, but it's that documentation where I first noticed an SMP/nAMP bit.