Search code examples
javaperformancehashmapload-factor

When to trash hashmap contents to avoid performance degradation?


I'm woking on Java with a large (millions) hashmap that is actually built with a capacity of 10.000.000 and a load factor of .75 and it's used to cache some values

since cached values become useless with time (not accessed anymore) but I can't remove useless ones while on the way I would like to entirely empty the cache when its performance starts to degrade. How can I decide when it's good to do it?

For example, with 10 millions capacity and .75 should I empty it when it reaches 7.5 millions of elements? Because I tried various threshold values but I would like to have an analytic one.

I've already tested the fact that emping it when it's quite full is a boost for perfomance (first 2-3 algorithm iterations after the wipe just fill it back, then it starts running faster than before the wipe)

EDIT: ADDITIONAL INFO

The hashmap has long as keys and float as values. It contains cached correlation of contents, since it's a dot product of tag vectors I wanted to cache them (to increase performance).

So basically what I do is to compute a long key using the hashcodes of the 2 contents:

static private long computeKey(Object o1, Object o2)
{
    int h1 = o1.hashCode();
    int h2 = o2.hashCode();

    if (h1 < h2)
    {
        int swap = h1;
        h1 = h2;
        h2 = swap;
    }

    return ((long)h1) << 32 | h2;
}

and use it to retrieve stored values. What happens is that since it's a hierarchical clustering contents are merged and their correlation values with other contents are not needed any more.. that's why I want to wipe the hashmap from time to time, to avoid degradation due to useless values inside it.

Using a WeakHashMap will unpredictably wipe out data also when they are still needed.. I've no control over it.

Thanks


Solution

  • Why not use an LRU Cache? From Java's LinkedHashMap documentation:

    A special constructor is provided to create a linked hash map whose order of iteration is the order in which its entries were last accessed, from least-recently accessed to most-recently (access-order). This kind of map is well-suited to building LRU caches. Invoking the put or get method results in an access to the corresponding entry (assuming it exists after the invocation completes). The putAll method generates one entry access for each mapping in the specified map, in the order that key-value mappings are provided by the specified map's entry set iterator. No other methods generate entry accesses. In particular, operations on collection-views do not affect the order of iteration of the backing map.

    So basically, every once in a while as your map gets too big, just delete the first x values that the iterator gives you.

    See documentation for removeEldestEntry to have this done for you automatically.

    Here is code that demonstrates:

     public static void main(String[] args) {
        class CacheMap extends LinkedHashMap{
          private int maxCapacity;
          public CacheMap(int initialCapacity, int maxCapacity) {
            super(initialCapacity, 0.75f, true);
            this.maxCapacity = maxCapacity;
          }
    
          @Override
          protected boolean removeEldestEntry(Map.Entry eldest) {
            return size()>maxCapacity;
          }
        }
    
        int[] popular = {1,2,3,4,5};
        CacheMap myCache = new CacheMap(5, 10);
        for (int i=0; i<100; i++){
          myCache.put(i,i);
          for (int p : popular) {
            myCache.get(p);
          }
        }
    
        System.out.println(myCache.toString()); 
        //{95=95, 96=96, 97=97, 98=98, 99=99, 1=1, 2=2, 3=3, 4=4, 5=5}
      }