multithreading scala concurrency jvm locking

How to demonstrate thread safety issue

A coworker told me this Cache object needs to be made thread-safe:

import scala.collection.mutable

object Cache {
    private val data = mutable.Map[Int, String]()

    def get(key: Int): Option[String] = data.get(key)

    def set(key: Int, value: String): Unit = data(key) = value
}

This is my attempt to reproduce the issue:

def expensiveMethod(): String = {
    Thread.sleep(1)
    Thread.currentThread.getName
}

for (i <- 1 to 3) {
    new Thread(() => {
        val cachedValue = Cache.get(1)

        if (cachedValue.nonEmpty) {
            println(s"${Thread.currentThread.getName}: cached value = ${cachedValue}")
        } else {
            val newValue = expensiveMethod()
            Cache.set(1, newValue)

            val finalValue = Cache.get(1)
            println(s"${Thread.currentThread.getName}: final value = ${finalValue}")
        }
    }).start()
}

// Thread-2: final value = Some(Thread-2)
// Thread-3: final value = Some(Thread-2)
// Thread-1: final value = Some(Thread-2)

I'm trying to get my head around it. Three threads modify the data at the same time. Only one wins. Why is that a problem?

I do find it strange that in-between Thread-3 setting and reading the value, apparently Thread-2 has changed it. Is that the crux of the issue?

I tried using a lock but it didn't completely prevent that from happening, though it did seem to reduce the issue:

    def set(key: Int, value: String): Unit = data.synchronized { data(key) = value }

I also tried using concurrent.TrieMap instead of mutable.Map. It also appeared to reduce the issue but without running hundreds of tests I don't know how to be sure.

Solution

in-between Thread-3 setting and reading the value, apparently Thread-2 has changed it. Is that the crux of the issue?

Well, that is one of the issues. Your expensiveMethod can potentially be called several times, which is, at best, wasteful, but could potentially be more problematic if it has side-effects.

Like, if you were to keep statistics about your cache affinity, they'd be all off. Or you could cause a deadlock in the database ... etc.

A bigger (albeit, harder to reproduce on purpose) problem is Map implementation itself is not thread-safe.

This means that if one thread is doing put, while the other one is inside get, the result is undefined, and could even cause the JVM itself to crash.

To avoid the second (bigger) problem, you could just use java's ConcurrentHashMap (or concurrent.TrieMap, though, it would be less efficient). That will guarantee always having a valid (albeit, possibly outdated) result from get, and assurance against random crashes, but does nothing to ease the first issue with potential redundant calls to expensiveMethod.

To avoid that (the wasted calls), you would need to synchronize the entire block:

    val cachedValue = Cache.synchronized { 
       Cache.get(1).getOrElse { 
        val value = expensiveMethod()
        Cache.set(1, value)
        value
       }
    }