My goal is: Achieve two "identical" lists, removing non coincident objects from both of them in the less possible time.
What I've achieved: Two identical lists, removing non coincidents, but takes too long.
My problem is:
I have two big lists (800k records each), those lists are filled with objects (HashCode and Equals correctly implemented on those objects) and I need to delete the non coincident records on both lists. It could be only 3-100 records (nothing compared to 800k registers).
The problem is mainly performance, cause its taking 10+ minutes to do the operation.
This is what I've tried:
retainAll: this works, but takes too long
Using HashSet.retainAll: Can't use sets in my lists. It takes seconds, works wonderful, but I need duplicates
Manually: one by one from list 1 looking in list 2, saving no coincidents in a third list, repeat operation backwards in a 4th list, then using removeAll with both lists.
Iterators: looked like a good idea to copy lists, remove coincidences from both copied lists, this way I have less items each loop, and I only need to find once, because the remainings are non coincidents. Finally use removeAll to remove non coincidents from original lists, but still takes +-10 minutes.
I need to find a quicker way to do this, but can't figure it out.
About the duplicates: Sounds weird, but for my program 2 objects are equal if they have the same "name" but could have different values in other attributes that I need.
Not understanding all the reasons why you have equality on the name, but not the values.. or even how you determine if list A has "foo", and list B has 2x "foo" if you want to keep all "foo"...
Here is an idea.. Make a HashSet of "name" to array of objects of the same name... now you can use retainAll and then quickly reconstruct the original collection from the hashset values.