Load 1.5 Million Records from Database 1
Load 1.5 Million Records from Database 2
List<DannDB> dDb = fromNamedQuery(); //return em.createNamedQuery("").getResultList();
List<LannDB> lDb = fromNamedQuery();
Compare its data.
Update/persist into Database (Using JPA)
and program ends after two hours.
Same iteration happens every third hour and many a times give Out of Memory.
Does following statement work, does object becomes out of scope with this?
dDb.clear();
or
dDb = null
or what else I can do?
Assuming that your goal is to reduce the occurrence of OOMEs over all other considerations ...
Assigning null
to the List
object will make the entire list eligible for garbage collection. You then need to create a new (presumably empty) list to replace it.
Calling clear()
will have a similar effect1 to nulling and recreating, though the details will depend on the List
implementation. (For example, calling clear()
on an ArrayList
doesn't release the backing array. It just nulls the array cells.)
If you can recycle an ArrayList
for a list of roughly the same size as the original, you can avoid the garbage while growing the list. (But we don't know this is an ArrayList
!)
Another factor in your use-case is that:
List<DannDB> dDb = fromNamedQuery();
is (presumably) going to create a new list anyway. That would render a clear()
pointless. (Just assign null
to dDb
, or let the variable go out of scope or be reassigned the new list.)
A final issue is that it is conceivable that the list is finalizable. That could mean that the list object takes longer to delete.
Overall, I can't say which of assigning null
and calling clear()
will be better for the memory footprint. Or that either of these will make a significant difference. But there is no reason why you can't try both alternatives, and observe what happens.
The only other things I can suggest are:
The last one is the only solution that is scalable; i.e. that will work with an ever larger number of records. (Modulo the time taken to deal with more records.)
Important Notes:
System.gc()
is unlikely to help. At best it will (just) make your application slower.1 - Similar from the perspective of storage management. Obviously, there are semantic differences between clearing a list and creating a new one; e.g. if some other part of your application has a reference to the original list.
2 - Those of you are old enough will remember the classic way of implementing a payroll system with magnetic tape storage. If you can select from the two data sources in the same key order, you may be able to use the classic approach to compare them. For example, reading two resultsets in parallel.