Search code examples
javacachingibm-was

Java Caching frameworks for maintaining huge data


Java Caching frameworks for storing huge data.

Context: We are developing a Restful service using Jersey 2.6 and will deploy it on WAS 8.5. This service need to serve more than 10 million requests per day.

We need to implement a cache to store more than 300k object (data will come from DB). And we need some way to update the cache on a daily basis.

  1. Is this approach of caching 300k object and updating them on a daily basis is recommended?
  2. Are there any Java framework which supports this kind of functionality?

Solution

  • Your question is too general to get a clear answer. You need to be describe what the problem you are trying to solve is.

    • Are you concerned about response times?
    • Are you trying to protect your DB from doing heavy lifting?
    • Are expecting to have to scale out and want to be sure that you can deal with future loads?

    Additionally some more contextual information would be useful, especially:

    • How dynamic is your data compared to your requests?
    • What percentage of your data population will be requested on average per day? (How many of the 3 lakh objects will be enquired upon at least once per day? If you don't know, provide your best guess).

    Your figures given as 3 lakh (300k) data points and 10M requests means that you are expecting to hit each object on average 33 times a day, which indicates that you are more concerned about back end DB load than your responses being right up to date.

    In my experience there are a lot of fairly primitive solutions which will work much better than going for a heavyweight distributed systems such as Mongo, Cassandra or Coherence.

    My first response would be: Keep it simple - 300k objects is not too much to store in an internal hash table which you flush once a day and populate on first request.

    If you need to scale horizontally, I would suggest Memcache Spymemcached with a 1 day cache time, which populate when you don't find an existing entry.

    I would NOT go for something like Cassandra or Mongo unless you have real compelling reasons to require a persistent store. Rationale: Purging can become really onerous, especially if your data is fast moving. For example: Cassandra does not really know how to delete, but instead "tombstones" deleted entries, which means that your data store will simply grow and grow until you create a strategy for purging.