The normal practice is to wrap the CAS instruction in a while loop on platforms that support CAS instructions. But platforms such as SPARC don't have atomic CAS instructions.
SPARC v8 (32-bit) and earlier lack CAS, but v9 (64-bit) does have CAS.
For a spin-lock v7 and v8 provide LDSTUB
which is an atomic-read-modify-write of an unsigned byte, which writes 0xFF. That does the lock phase of a spin-lock. An ordinary write of 0 (or anything not 0xFF) will unlock, when using TSO -- for PSO you need an STBAR
before the write. [There is also the SWAP
atomic-read-modify-write, which can be used in the same way.]
To implement CAS (and Fetch-Op) operations on v7/v8 you need an auxiliary spin-lock.
More generally:
(and as noted in comments) for "modern" devices, if CAS is not supported then some form of "LL/SC" probably is...
...and a CAS operation can be synthesized using LL/SC. [FWIW: LL/SC is more general than CAS and avoids the dreaded ABA that straight CAS is prone to :-(]
but otherwise, once you have a spin-lock you can simulate most things...
...but if the thread holding a spin-lock goes to sleep, everybody gets to wait :-(
Machines (now historic) which provide neither LL/SC nor hardware support for a spin-lock may well have sequentially-consistent memory. In which case you can implement a spin-lock using Peterson's Algorithm, or Burns', or others'), in software.