Search code examples
javamultithreadingcachingconcurrencyvolatile

Setting an AtomicBoolean again


I am using an AtomicBoolean to enforce volatile visibility between threads. One thread is updating the value, another thread is only reading it.

Say the current value is true. Now say a write thread sets its value to true again:

final AtomicBoolean b = new AtomicBoolean(); // shared between threads

b.set(true);
// ... some time later
b.set(true);

After this 'dummy' set(true), is there a performance penalty when the read thread calls get()? Does the read thread have to re-read and cache the value?

If that is the case, the write thread could have done:

b.compareAndSet(false, true);

This way, the read thread only has to invalidate for real changes.


Solution

  • compareAndSet():

    public final boolean compareAndSet(boolean expect, boolean update) {
        int e = expect ? 1 : 0;
        int u = update ? 1 : 0;
        return unsafe.compareAndSwapInt(this, valueOffset, e, u);
    }
    

    compareAndSwapInt() is native already:

    UNSAFE_ENTRY(jboolean, Unsafe_CompareAndSwapInt(JNIEnv *env, jobject unsafe, jobject obj, jlong offset, jint e, jint x))
      UnsafeWrapper("Unsafe_CompareAndSwapInt");
      oop p = JNIHandles::resolve(obj);
      jint* addr = (jint *) index_oop_from_field_offset_long(p, offset);
      return (jint)(Atomic::cmpxchg(x, addr, e)) == e;
    UNSAFE_END
    

    Where Atomic::cmpxchg is generated somewhere at the beginning of JVM execution as

      address generate_atomic_cmpxchg() {
        StubCodeMark mark(this, "StubRoutines", "atomic_cmpxchg");
        address start = __ pc();
    
        __ movl(rax, c_rarg2);
       if ( os::is_MP() ) __ lock();
        __ cmpxchgl(c_rarg0, Address(c_rarg1, 0));
        __ ret(0);
    
        return start;
      }
    

    cmpxchgl() generates x86 code (it has a longer, legacy code path too, so I do not copy that one here):

     InstructionMark im(this);
     prefix(adr, reg);
     emit_byte(0x0F);
     emit_byte(0xB1);
     emit_operand(reg, adr);
    

    0F B1 is really a CMPXCHG operation. If you check the code above, if ( os::is_MP() ) __ lock(); emits a LOCK prefix on multiprocessor machines (let me just skip quoting lock(), it emits a single F0 byte), so practically everywhere.

    And as the CMPXCHG docs says:

    This instruction can be used with a LOCK prefix to allow the instruction to be executed atomically. To simplify the interface to the processor’s bus, the destination operand receives a write cycle without regard to the result of the comparison. The destination operand is written back if the comparison fails; otherwise, the source operand is written into the destination. (The processor never produces a locked read without also producing a locked write.)

    So on a multiprocessor x86 machine, the NOP-CAS also does a write, affecting the cache-line. (Emphasis was added by me)