Why Java StampedLock faster than ReentrantReadWriteLock

This comparison of StampedLock and other locks show that StampedLock is the fastest as contention rises. However, this, and various other articles, none list out why it is faster. It seems to be using the same CAS semantics as other type of locks? Can anyone explain why it is fastest as contention rises?
For example, below in this code, the writeLock blocks not only other writeLocks, but also readLocks. I am not concerned with optimisticReadLocks etc at this point. Just plain writeLock.. what is advantage and how is it faster than ReentrantLock (plus it doesn't even have reentrancy).

    public static void main(String[]args)throws Exception{
    StampedLock lk = new StampedLock();

    new Thread(() -> {
        long stamp = lk.writeLock();
        try {
            out.println("locked.. sleeping");
            TimeUnit.SECONDS.sleep(5);
            lk.unlock(stamp);
        } catch (InterruptedException e) {
            e.printStackTrace();
        }
    }).start();

    new Thread(() -> {
        long stamp = lk.writeLock();
        try {
            out.println("locked.. sleeping");
            TimeUnit.SECONDS.sleep(5);
            lk.unlock(stamp);
        } catch (InterruptedException e) {
            e.printStackTrace();
        }
    }).start();

}

Solution

To be clear, the StampedLock is much faster on reads when contention rises. Writers are a bit faster, but not as close to as fast as reads. I'll explain why.

Most of the time, with read-write-locks, writes are way fewer. However, even despite this, each time you acquire a readLock() on the ReentrantReadWriteLock you have to increment a reader count. This forces a cache invalidation on all cores using this lock.

Under heavy contention this can cause a significant slow down when doing reads. Reads should be fast, when doing a readLock() we shouldn't have to update a variable, it's counter intuitive.

What if, instead, we have a stamp or let's say, version? One that only updates once per read iteration.

What this does for us is, under contention, if only one thread updates a stamp value (let's say after a write) all reading threads will be executing cache hits when wanted to read on the lock. This prohibits cache invalidation and allows the lock to execute in a more appropriate fashion than a RRWL.

The pattern for using a StampedLock is similar to a lock-free algorithm when using tryOptimisticRead (like CAS)

Get a stamp
Read a value
Has the stamp changed?
- Yes, try again or issue a blocking read
- No, we are good, let's move on.