Search code examples
javacachingmemory-managementguavalru

How to remove elements from map if it reaches a memory size limit?


I have implemented a LRU cache using ConcurrentLinkedHashMap. In the same map, I am purging events if my map reaches a particular limit as shown below.

I have a MAX_SIZE variable which is equivalent to 3.7 GB and as soon as my map reaches that limit, I am purging events from my map.

Below is my code:

import java.util.concurrent.ConcurrentMap;
import com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap;
import com.googlecode.concurrentlinkedhashmap.EvictionListener;

// does this really equal to 3.7 GB? can anyone explain this?
public static final int MAX_SIZE = 20000000; //equates to ~3.7GB with assumption that each event is 200 bytes AVG

public static EvictionListener<String, DataObject> listener = new EvictionListener<String, DataObject>() {
    public void onEviction(String key, DataObject value) {
        deleteEvents();
    }
};
public static final ConcurrentMap<String, DataObject> holder = new ConcurrentLinkedHashMap.Builder<String, DataObject>()
            .maximumWeightedCapacity(MAX_SIZE).listener(listener).build();

private static void deleteEvents() {
    int capacity = MAX_SIZE - (MAX_SIZE * (20 / 100));
    if (holder.size() >= capacity) {
        int numEventsToEvict = (MAX_SIZE * 20) / 100;
        int counter = 0;
        Iterator<String> iter = holder.keySet().iterator();
        while (iter.hasNext() && counter < numEventsToEvict) {
            String address = iter.next();
            holder.remove(address);
            System.out.println("Purging Elements: " +address);
            counter++;
        }
    }
}

// this method is called every 30 seconds from a single background thread 
// to send data to our queue
public void submit() {
    if (holder.isEmpty()) {
        return;
    }

    // some other code here

    int sizeOfMsg = 0;
    Iterator<String> iter = holder.keySet().iterator();
    int allowedBytes = MAX_ALLOWED_SIZE - ALLOWED_BUFFER;

    while (iter.hasNext() && sizeOfMsg < allowedBytes) {
        String key = iter.next();
        DataObject temp = holder.get(key);

        // some code here

        holder.remove(key);

        // some code here to send data to queue
    }
}   

// this holder map is used in below method to add the events into it.
// below method is being called from some other place.
public void addToHolderRequest(String key, DataObject stream) {
    holder.put(key, stream);
}

Below is the maven dependency I am using for this:

<dependency>
    <groupId>com.googlecode.concurrentlinkedhashmap</groupId>
    <artifactId>concurrentlinkedhashmap-lru</artifactId>
    <version>1.4</version>
</dependency>

I am not sure whether this is the right way to do this? Does this MAX_SIZE really equates to 3.7 GB if events are of 200 bytes in average? Is there any better way to do this? I also have a background thread which call deleteEvents() method every 30 second and same background thread also calls submit method to extract data from holder map and send to queue.

So idea is, add events to holder map in addToHolderRequest method and then from the background every 30 second call submit method which will send data to our queue by iterating this map and then after submit method is finished, call deleteEvents() method from same background thread which will purge elements. I am running this code in production and it looks like it is not purging events properly and my holder map size keeps growing. I have a min/max heap memory set as 6GB.


Solution

    1. In lieu of estimating the size of objects in the JVM and referencing them using strong references you can use soft references which are "most often used to implement memory-sensitive caches" (SoftReference). e.g. CacheBuilder.softValues() from google/guava: Google Core Libraries for Java 6+: "Softly-referenced objects will be garbage-collected in a globally least-recently-used manner, in response to memory demand." However, I'd recommend first familiarizing yourself with CachesExplained · google/guava Wiki (specifically the Reference-based Eviction section).
    2. As a tweak to using soft references you can also try a "victim caching approach" as described here which uses a "normal cache that evicts to [a] soft cache, and recovers entries on a miss if possible".
    3. If you are certain you want to actually estimate the size of objects then take a look at Ehcache and its Sizing Storage Tiers. It has Built-In Sizing Computation and Enforcement for memory-limited caches.