Search code examples
javamultithreadingconcurrencythread-safetyjava.util.concurrent

Issues with Concurrent Execution and Synchronization in a Custom Java Caching Mechanism


I’m developing a custom caching mechanism for my Java application to cache objects that are expensive to create (e.g., database connections). The idea is to allow concurrent access to the cache while maintaining thread safety. However, I’m encountering issues with thread synchronization, leading to unexpected behavior like overwriting cache entries and threads not seeing the most recent updates.

Below is the simplified version of my cache implementation:

import java.util.HashMap;
import java.util.Map;

public class CacheManager {
    private Map<String, Object> cache = new HashMap<>();

    public synchronized void addToCache(String key, Object value) {
        if (!cache.containsKey(key)) {
            cache.put(key, value);
        }
    }

    public synchronized Object getFromCache(String key) {
        return cache.get(key);
    }

    public synchronized Object removeFromCache(String key) {
        return cache.remove(key);
    }
}

Even with synchronized methods, when I run multiple threads that access addToCache and getFromCache simultaneously, I occasionally get null values or incorrect objects, which should not happen given the current logic. I suspect the issue lies in the way I’m handling synchronization, but I can’t pinpoint the exact problem or how to resolve it.

How can I fix my synchronization logic to ensure thread-safe access to the cache? Are there any best practices I should follow for this kind of scenario?


Solution

  • The code you have cannot possibly serve as a caching mechanism with the safety properties you appear to want. We can trivially prove this, when we think about how one would use this code:

    CacheManager cm = ...;
    String key = "test";
    
    Connection c = (Connection) cm.getFromCache(key);
    if (c == null) {
      c = getConnection(); // expensive operation
      cm.addToCache(key, c);
    }
    

    This simple code is broken. After all, whilst getFromCache is synchronized, and addToCache is synchronized, the entire operation isn't magically synchronized too, so, 2 threads can simultaneously get to getFromCache, then each thread will sequentially (as in, one will wait for the other) go through getFromCache, both end up with a null value, then both make a DB connection, one thread 'wins' the second synchronized battle and ends up adding the connection it made to your cache map, and then the second thread will overwrite that entry (and thus, you now have a resource leak - that Connection needs to be close()d, but nothing refers to it anymore.

    There is no fixing this based on this design.

    Instead, you need to completely re-imagine how all this works. You need a 'primitive' (a single operation) that does both, in a single go. This is convoluted; you don't want to create that connection until it is necessary to do so, but, you can't determine whether it is necessary to do so: You can't call getFromCache or isPresentInCache or whatnot, because the answer you receive is meaningless: You have no guarantee it's still true.

    You can use similar thought processes to figure out how thread A can assume that some value will be in the cache, but then B removes that value from the cache in between, or how multiple threads each will end up putting its object in cache and therefore will also remove it from cache, walking all over themselves and causing your cache manager object to be in unexpected states.

    Thus, the first dilemma is that you don't want to hand a connection to your cache mechanism, you want to hand a recipe to it: A recipe to make that connection - so that the cache can run it, in a synchronized fashion, when it is necessary to do so. If multiple threads all hit a point simultaneously where they e.g. need a connection based on key "X" when that isn't in the cache yet, then one arbitrary 'winner' should be allowed to make it and run its stuff, then the other threads get to.

    This gets us into an absolute boatload of misunderstandings you are suffering from:

    Connection isn't like that

    DB Connections cannot be used from multiple threads. Caching is entirely the wrong name and principle and cannot be applied to it at all. You have to get rid of this code and get rid of the name. Instead, you use something called a connection pool: Any thread can grab a connection from the pool (and the pool will make one if there aren't any available, or freeze the thread if some maximum count is reached), then that connection is out of the pool and will not be handed to anybody else. Once that thread is done with it, the thread can return the connection to the pool, and any thread that next asks the pool for a connection may well get this 'recycled' connection.

    However, this is way more complex than it sounds, and it already sounds complex: Connections are stateful affairs, they have per-connection settings and various sub-resources that need closing when you hand it back. They also tend to 'get stuck' if you leave them open for a long time, so you want to check if a recycled connection is still 'live' before handing it out to a new requester.

    It's so complex, you really, really don't want to write it. Instead, use an existing one, like HikariCP.

    The same goes for caching

    Even if you just use db connections as an example and actually have some other idea, caching as a concept is similarly complicated. You don't want to write this stuff, because you'll mess it up. Threading is particularly evil: If you mess up, usually you won't know. Unlike most bugs, a threading issue may never occur and cannot be reliably tested. Hence, bad idea to write that stuff yourself if you're not an expert.

    Use existing solutions, such as the very widely used guava cache builder instead.

    This isn't how you java. Not in 2024.

    Your code is about 30 years out of date. Having to cast those objects is quite annoying. You'd use generics and you'd fashion keys that carry the type so you don't have to, and to ensure your code is type safe.

    Read up on generics for more information.

    Recipes

    If you insist on writing all this yourself, [A] don't, and [B] that 'you want to hand the cache manager a recipe' thing is done with functional interfaces, so you can do something like:

    cm.get(key, () -> getConnection());
    

    That () -> getConnection() is a recipe. It doesn't call getConnection, it represents the idea of doing that, as a thing you can pass around. So, you pass around the idea of calling getConnection() and you hand that to cache manager. If the cache manager determines that it is necessary to call it, it can do so, and e.g. do so inside a synchronized block of some sort to ensure no 2 threads simultaneously start making a connection object for the same key. This is exactly how guava cache builder works - see the tutorial linked above.