Search code examples
javalistmemory-managementgarbage-collection

Java: Does clear() the big size list help in quick garbage collection?


Load 1.5 Million Records from Database 1

Load 1.5 Million Records from Database 2

List<DannDB> dDb = fromNamedQuery(); //return em.createNamedQuery("").getResultList();
List<LannDB> lDb = fromNamedQuery();

Compare its data.

Update/persist into Database (Using JPA)

and program ends after two hours.

Same iteration happens every third hour and many a times give Out of Memory.

Does following statement work, does object becomes out of scope with this?

 dDb.clear();

  or 

 dDb = null

or what else I can do?


Solution

  • Assuming that your goal is to reduce the occurrence of OOMEs over all other considerations ...

    Assigning null to the List object will make the entire list eligible for garbage collection. You then need to create a new (presumably empty) list to replace it.

    Calling clear() will have a similar effect1 to nulling and recreating, though the details will depend on the List implementation. (For example, calling clear() on an ArrayList doesn't release the backing array. It just nulls the array cells.)

    If you can recycle an ArrayList for a list of roughly the same size as the original, you can avoid the garbage while growing the list. (But we don't know this is an ArrayList!)

    Another factor in your use-case is that:

    List<DannDB> dDb = fromNamedQuery();
    

    is (presumably) going to create a new list anyway. That would render a clear() pointless. (Just assign null to dDb, or let the variable go out of scope or be reassigned the new list.)

    A final issue is that it is conceivable that the list is finalizable. That could mean that the list object takes longer to delete.

    Overall, I can't say which of assigning null and calling clear() will be better for the memory footprint. Or that either of these will make a significant difference. But there is no reason why you can't try both alternatives, and observe what happens.

    The only other things I can suggest are:

    • Increase the heap size (and the RAM footprint).
    • Change the application so that you don't need to hold entire database snapshots in memory. Depending on the nature of the comparison, you could do it in "chunks" or by streaming the records2.

    The last one is the only solution that is scalable; i.e. that will work with an ever larger number of records. (Modulo the time taken to deal with more records.)


    Important Notes:

    1. Manually running System.gc() is unlikely to help. At best it will (just) make your application slower.
    2. Since the real problem is that you are getting OOMEs, anything that tries to get the JVM to shrink the heap by giving memory back to the OS will be counterproductive.

    1 - Similar from the perspective of storage management. Obviously, there are semantic differences between clearing a list and creating a new one; e.g. if some other part of your application has a reference to the original list.
    2 - Those of you are old enough will remember the classic way of implementing a payroll system with magnetic tape storage. If you can select from the two data sources in the same key order, you may be able to use the classic approach to compare them. For example, reading two resultsets in parallel.