Comparing two large lists in java

I have to Array lists with 1000 objects in each of them. I need to remove all elements in Array list 1 which are there in Array list 2. Currently I am running 2 loops which is resulting in 1000 x 1000 operations in worst case.


List<DataClass> dbRows = object1.get("dbData");
List<DataClass> modifiedData = object1.get("dbData");
List<DataClass> dbRowsForLog = object2.get("dbData");
for (DataClass newDbRows : dbRows) {
            boolean found=false;
            for (DataClass oldDbRows : dbRowsForLog) {
                if (newDbRows.equals(oldDbRows)) {
                    found=true;
                    modifiedData.remove(oldDbRows);
                    break;
                }
            }
        }

public class DataClass{
    private int categoryPosition;
    private int subCategoryPosition;
    private Timestamp lastUpdateTime;
    private String lastModifiedUser;
    // + so many other variables 
    
    public boolean equals(Object o) {
        if (this == o) {
            return true;
        }
        if (o == null || getClass() != o.getClass()) {
            return false;
        }
        DataClass dataClassRow = (DataClass) o;
        return  categoryPosition == dataClassRow.categoryPosition
                && subCategoryPosition == dataClassRow.subCategoryPosition && (lastUpdateTime.compareTo(dataClassRow.lastUpdateTime)==0?true:false)
                && stringComparator(lastModifiedUser,dataClassRow.lastModifiedUser);
    }

    public String toString(){
        return "DataClass[categoryPosition="+categoryPosition+",subCategoryPosition="+subCategoryPosition
                +",lastUpdateTime="+lastUpdateTime+",lastModifiedUser="+lastModifiedUser+"]";
    }
    
    public static boolean stringComparator(String str1, String str2){
         return (str1 == null ? str2 == null : str1.equals(str2));
    }

    public int hashCode() {
    int hash = 7;
    hash = 31 * hash + (int) categoryPosition;
    hash = 31 * hash + (int) subCategoryPosition
    hash = 31 * hash + (lastModifiedUser == null ? 0 : lastModifiedUser.hashCode());
    return hash;
    }
}

The best work around i could think of is create 2 sets of strings by calling tostring() method of DataClass and compare string. It will result in 1000 (for making set1) + 1000 (for making set 2) + 1000 (searching in set ) = 3000 operations. I am stuck in Java 7. Is there any better way to do this? Thanks.

Solution

Let Java's builtin collections classes handle ~~most of~~ the optimization for you by taking advantage of a HashSet. The complexity of its contains method is O(1). I would highly recommend looking up how it achieves this because it's very interesting.

List<DataClass> a = object1.get("dbData");
HashSet<DataClass> b = new HashSet<>(object2.get("dbData"));
a.removeAll(b);
return a;

And it's all done for you.

EDIT: caveat

In order for this to work, DataClass needs to implement Object::hashCode. Otherwise, you can't use any of the hash-based collection algorithms.

EDIT 2: implementing hashCode

An object's hash code does not need to change every time an instance variable changes. The hash code only needs to reflect the instance variables that determine equality.

For example, imagine each object had a unique field private final UUID id. In this case, you could determine if two objects were the same by simply testing the id value. Fields like lastUpdateTime and lastModifiedUser would provide information about the object, but two instances with the same id would refer to the same object, even if the lastUpdateTime and lastModifiedUser of each were different.

The point is that if you really want to want to optimize this, include as few fields as possible in the hash computation. From your example, it seems like categoryPosition and subCategoryPosition might be enough.

Whatever fields you choose to include, the simplest way to compute a hash code from them is to use Objects::hash rather than running the numbers yourself.