Search code examples
javamultithreadingsynchronizationthread-safetysynchronized

Synchronizing on String objects in Java


I have a webapp that I am in the middle of doing some load/performance testing on, particularily on a feature where we expect a few hundred users to be accessing the same page and hitting refresh about every 10 seconds on this page. One area of improvement that we found we could make with this function was to cache the responses from the web service for some period of time, since the data is not changing.

After implementing this basic caching, in some further testing I found out that I didn't consider how concurrent threads could access the Cache at the same time. I found that within the matter of ~100ms, about 50 threads were trying to fetch the object from the Cache, finding that it had expired, hitting the web service to fetch the data, and then putting the object back in the cache.

The original code looked something like this:

private SomeData[] getSomeDataByEmail(WebServiceInterface service, String email) {

  final String key = "Data-" + email;
  SomeData[] data = (SomeData[]) StaticCache.get(key);

  if (data == null) {
      data = service.getSomeDataForEmail(email);

      StaticCache.set(key, data, CACHE_TIME);
  }
  else {
      logger.debug("getSomeDataForEmail: using cached object");
  }

  return data;
}

So, to make sure that only one thread was calling the web service when the object at key expired, I thought I needed to synchronize the Cache get/set operation, and it seemed like using the cache key would be a good candidate for an object to synchronize on (this way, calls to this method for email [email protected] would not be blocked by method calls to [email protected]).

I updated the method to look like this:

private SomeData[] getSomeDataByEmail(WebServiceInterface service, String email) {

  
  SomeData[] data = null;
  final String key = "Data-" + email;
  
  synchronized(key) {      
    data =(SomeData[]) StaticCache.get(key);

    if (data == null) {
        data = service.getSomeDataForEmail(email);
        StaticCache.set(key, data, CACHE_TIME);
    }
    else {
      logger.debug("getSomeDataForEmail: using cached object");
    }
  }

  return data;
}

I also added logging lines for things like "before synchronization block", "inside synchronization block", "about to leave synchronization block", and "after synchronization block", so I could determine if I was effectively synchronizing the get/set operation.

However it doesn't seem like this has worked. My test logs have output like:

(log output is 'threadname' 'logger name' 'message')  
http-80-Processor253 jsp.view-page - getSomeDataForEmail: about to enter synchronization block  
http-80-Processor253 jsp.view-page - getSomeDataForEmail: inside synchronization block  
http-80-Processor253 cache.StaticCache - get: object at key [[email protected]] has expired  
http-80-Processor253 cache.StaticCache - get: key [[email protected]] returning value [null]  
http-80-Processor263 jsp.view-page - getSomeDataForEmail: about to enter synchronization block  
http-80-Processor263 jsp.view-page - getSomeDataForEmail: inside synchronization block  
http-80-Processor263 cache.StaticCache - get: object at key [[email protected]] has expired  
http-80-Processor263 cache.StaticCache - get: key [[email protected]] returning value [null]  
http-80-Processor131 jsp.view-page - getSomeDataForEmail: about to enter synchronization block  
http-80-Processor131 jsp.view-page - getSomeDataForEmail: inside synchronization block  
http-80-Processor131 cache.StaticCache - get: object at key [[email protected]] has expired  
http-80-Processor131 cache.StaticCache - get: key [[email protected]] returning value [null]  
http-80-Processor104 jsp.view-page - getSomeDataForEmail: inside synchronization block  
http-80-Processor104 cache.StaticCache - get: object at key [[email protected]] has expired  
http-80-Processor104 cache.StaticCache - get: key [[email protected]] returning value [null]  
http-80-Processor252 jsp.view-page - getSomeDataForEmail: about to enter synchronization block  
http-80-Processor283 jsp.view-page - getSomeDataForEmail: about to enter synchronization block  
http-80-Processor2 jsp.view-page - getSomeDataForEmail: about to enter synchronization block  
http-80-Processor2 jsp.view-page - getSomeDataForEmail: inside synchronization block  

I wanted to see only one thread at a time entering/exiting the synchronization block around the get/set operations.

Is there an issue in synchronizing on String objects? I thought the cache-key would be a good choice as it is unique to the operation, and even though the final String key is declared within the method, I was thinking that each thread would be getting a reference to the same object and therefore would synchronization on this single object.

What am I doing wrong here?

Update: after looking further at the logs, it seems like methods with the same synchronization logic where the key is always the same, such as

final String key = "blah";
...
synchronized(key) { ...

do not exhibit the same concurrency problem - only one thread at a time is entering the block.

Update 2: Thanks to everyone for the help! I accepted the first answer about intern()ing Strings, which solved my initial problem - where multiple threads were entering synchronized blocks where I thought they shouldn't, because the key's had the same value.

As others have pointed out, using intern() for such a purpose and synchronizing on those Strings does indeed turn out to be a bad idea - when running JMeter tests against the webapp to simulate the expected load, I saw the used heap size grow to almost 1GB in just under 20 minutes.

Currently I'm using the simple solution of just synchronizing the entire method - but I really like the code samples provided by martinprobst and MBCook, but since I have about 7 similar getData() methods in this class currently (since it needs about 7 different pieces of data from a web service), I didn't want to add almost-duplicate logic about getting and releasing locks to each method. But this is definitely very, very valuable info for future usage. I think these are ultimately the correct answers on how best to make an operation like this thread-safe, and I'd give out more votes to these answers if I could!


Solution

  • Without putting my brain fully into gear, from a quick scan of what you say it looks as though you need to intern() your Strings:

    final String firstkey = "Data-" + email;
    final String key = firstkey.intern();
    

    Two Strings with the same value are otherwise not necessarily the same object.

    Note that this may introduce a new point of contention, since deep in the VM, intern() may have to acquire a lock. I have no idea what modern VMs look like in this area, but one hopes they are fiendishly optimised.

    I assume you know that StaticCache still needs to be thread-safe. But the contention there should be tiny compared with what you'd have if you were locking on the cache rather than just the key while calling getSomeDataForEmail.

    Response to question update:

    I think that's because a string literal always yields the same object. Dave Costa points out in a comment that it's even better than that: a literal always yields the canonical representation. So all String literals with the same value anywhere in the program would yield the same object.

    Edit

    Others have pointed out that synchronizing on intern strings is actually a really bad idea - partly because creating intern strings is permitted to cause them to exist in perpetuity, and partly because if more than one bit of code anywhere in your program synchronizes on intern strings, you have dependencies between those bits of code, and preventing deadlocks or other bugs may be impossible.

    Strategies to avoid this by storing a lock object per key string are being developed in other answers as I type.

    Here's an alternative - it still uses a singular lock, but we know we're going to need one of those for the cache anyway, and you were talking about 50 threads, not 5000, so that may not be fatal. I'm also assuming that the performance bottleneck here is slow blocking I/O in DoSlowThing() which will therefore hugely benefit from not being serialised. If that's not the bottleneck, then:

    • If the CPU is busy then this approach may not be sufficient and you need another approach.
    • If the CPU is not busy, and access to server is not a bottleneck, then this approach is overkill, and you might as well forget both this and per-key locking, put a big synchronized(StaticCache) around the whole operation, and do it the easy way.

    Obviously this approach needs to be soak tested for scalability before use -- I guarantee nothing.

    This code does NOT require that StaticCache is synchronized or otherwise thread-safe. That needs to be revisited if any other code (for example scheduled clean-up of old data) ever touches the cache.

    IN_PROGRESS is a dummy value - not exactly clean, but the code's simple and it saves having two hashtables. It doesn't handle InterruptedException because I don't know what your app wants to do in that case. Also, if DoSlowThing() consistently fails for a given key this code as it stands is not exactly elegant, since every thread through will retry it. Since I don't know what the failure criteria are, and whether they are liable to be temporary or permanent, I don't handle this either, I just make sure threads don't block forever. In practice you may want to put a data value in the cache which indicates 'not available', perhaps with a reason, and a timeout for when to retry.

    // do not attempt double-check locking here. I mean it.
    synchronized(StaticObject) {
        data = StaticCache.get(key);
        while (data == IN_PROGRESS) {
            // another thread is getting the data
            StaticObject.wait();
            data = StaticCache.get(key);
        }
        if (data == null) {
            // we must get the data
            StaticCache.put(key, IN_PROGRESS, TIME_MAX_VALUE);
        }
    }
    if (data == null) {
        // we must get the data
        try {
            data = server.DoSlowThing(key);
        } finally {
            synchronized(StaticObject) {
                // WARNING: failure here is fatal, and must be allowed to terminate
                // the app or else waiters will be left forever. Choose a suitable
                // collection type in which replacing the value for a key is guaranteed.
                StaticCache.put(key, data, CURRENT_TIME);
                StaticObject.notifyAll();
            }
        }
    }
    

    Every time anything is added to the cache, all threads wake up and check the cache (no matter what key they're after), so it's possible to get better performance with less contentious algorithms. However, much of that work will take place during your copious idle CPU time blocking on I/O, so it may not be a problem.

    This code could be commoned-up for use with multiple caches, if you define suitable abstractions for the cache and its associated lock, the data it returns, the IN_PROGRESS dummy, and the slow operation to perform. Rolling the whole thing into a method on the cache might not be a bad idea.