One popular solution to the ABA problem in lock-free data structures is to tag pointers with an additional monotonically incrementing tag.
struct aba {
void *ptr;
uint32_t tag;
};
However, this approach has a problem. It is really slow and has huge cache problems. I can obtain a speed-up of twice as much if I ditch the tag field. But this is unsafe?
So my next attempt stuff for 64 bit platforms stuffs bits in the ptr field.
struct aba {
uintptr __ptr;
};
uint32_t get_tag(struct aba aba) { return aba.__ptr >> 48U; }
But someone said to me that only 16 bits for the tag is unsafe. My new plan is to use pointer alignment to cache-lines to stuff more tag bits in but I want to know if that'll work.
If that fails to work my next plan is to use Linux's MAP_32BIT
mmap
flag to allocated data so I only need 32 bits of pointer space.
How many bits do I need for the ABA tag in lock-free data-structures?
The amount of tag bits that is practically safe can be estimated based on the preemption time and the frequency of pointer modifications.
To remind, the ABA problem happens when a thread reads the value it wants to change with compare-and-swap, gets preempted, and when it resumes the actual value of the pointer happens to be equal to what the thread read before. Therefore the compare-and-swap operation may succeed despite data structure modifications possibly done by other threads during the preemption time.
The idea of adding the monotonically incremented tag is to make each modification of the pointer unique. For it to succeed, increments must produce unique tag values during the time when a modifying thread might be preempted; i.e. for guaranteed correctness the tag may not wraparound during the whole preemption time.
Let's assume that preemption lasts a single OS scheduling time slice, which is typically tens to hundreds of milliseconds. The latency of CAS on modern systems is tens to hundreds of nanoseconds. So rough worst-case estimate is that there might be millions of pointer modifications while a thread is preempted, and so there should be 20+ bits in the tag in order for it to not wraparound.
In practice it can be possible to make a better estimate for a particular real use case, based on known frequency of CAS operations. One also need to estimate the worst-case preemption time more accurately; for example, a low-priority thread preempted by a higher-priority job might end up with much longer preemption time.