Need to cache over 100+ million string Key (~100 chars length) for Java standalone application.
Standard cache properties requisite:
- Persistent.
- TPS to fetch keys from cache in 10s of milli seconds range.
- Allows invalidation and expiry.
- Independent caching server, to allow multi-threaded access.
Preferably don't want to use enterprise database, as this 100M keys can scale to 500M which would use high memory and system resources with sluggish throughput.
Finally, to resolve this big data problem, with existing cache solutions available (hazelcast, Guava cache, eh-cache etc):
- Have broken the cache into two levels.
- grouped ~100K keys into one java collection and associated them with common property, in my case keys were having timestamp. So, that timestamp slot became the key for this second level cache block of 100K
- This time slot key is stored in Java persistent cache with value as compressed Java collection.
- The reason I manage to get good throughput with 2 level caching with overheads of compression and decompression is, my key searches were range bound so when cache match found, most of the subsequent searches were addressed by in memory java collection of previous search.
To conclude: identify common attribute in keys to group and break them into multilevel cache otherwise you would need hefty hardware and enterprise cache to support this big data problem.