Search code examples
javaarraylistdata-structureslinked-list

Remove duplicates Objects from List in Java


I know this kind of question is asked in stackoverflow lots of time before. But my issue is little bit different and I could not find any similar scenario, so posting this question here

Problem: I need to remove duplicate objects from ArrayList. Structure of my arrayList is as below

dataList.add(new ObjectClass("a","b"));
dataList.add(new ObjectClass("c","n"));
dataList.add(new ObjectClass("b","a")); // should be counted as duplicate
dataList.add(new ObjectClass("z","x"));

I need to remove objects from above list such as, it treats combination of "a,b" and "b,a" as duplicates and remove any of those duplicate

My solution: step 1) Override equals method in DataClass class

class DataClass {
      String source;
      String destination;
    
      DataClass(String src, String dest) {
        this.source = src;
        this.destination = dest;
      }
    
      // getter setter for source and destination variables

    @Override
    public boolean equals(Object obj) {
        System.out.println("inside equals");
        if (this == obj)
            return true;
        if (obj == null)
            return false;
        if (getClass() != obj.getClass())
            return false;
        ObjectClass other = (ObjectClass) obj;
        if(i.equals(other.getJ())
            && j.equals(other.getI())) {
            return true;
        } else return false;
    }

step 2) method to remove duplicates

public List<DataClass> removeDuplicates(List<DataClass> dataList) {
            List<DataClass> resultList = new ArrayList<DataClass>();
        
            // Convert array list to Linked list
            LinkedList<DataClass> linkedList = new LinkedList<DataClass>();
            for(DataClass obj: dataList) {
                linkedList.add(obj);
            }
            
            // Iterate through linked list and remove if values are duplicates
            for(int i = 0; i<linkedList.size();i++) {
              for(int j = i+1;j<linkedList.size();j++) {
                if(linkedList.get(j).equals(linkedList.get(i))) {
                    linkedList.remove();
                }
              }
            }
            resultList.addAll(linkedList);
            return resultList;
        }

I am still looking for any better optimized solution, if there is any. Thanks in advance

Update with solution : my equals method was needed to correct some comparison logic. So here is my updated ObjectClass instead of DataClass including correct overridden equals method

public class ObjectClass {
    String i;
    String j;
    public  ObjectClass(String i, String j) {
        this.i = i;
        this.j = j;
    }
    // getters setters
    // override hashcode
    
    @Override
    public boolean equals(Object obj) {
        if (this == obj)
            return true;
        if (obj == null)
            return false;
        if (getClass() != obj.getClass())
            return false;
        ObjectClass other = (ObjectClass) obj;

        if((i.equals(other.getJ()) || i.equals(other.getI()))
            && (j.equals(other.getI()) || j.equals(other.getJ()))) {
            return true;
        } else return false;
    }   
}
  1. After fixing equals method I tried below implementation in removeDuplicate method as Janos mentioned and it is working fine as expected

     for(ObjectClass obj: dataList) {
         if(!resultList.contains(obj))
             resultList.add(obj);
     }
    

Solution

  • There are several problems here:

    class DataClass {
          String source;
          String destination;
    
          // ...
    
        @Override
        public boolean equals(Object obj) {
            // ...
            ObjectClass other = (ObjectClass) obj;
            if(i.equals(other.getJ())
                && j.equals(other.getI())) {
                return true;
            } else return false;
        }
    

    The equals method casts the other object to ObjectClass. It should cast to the same class where this method is defined: DataClass.

    The equals method compares i and j variables, but they are not defined anywhere within the class. There's source and destination.

    The equals method will return true when this.i is the same as other.j and this.j is the same as other.i, and return false otherwise. In other words, (a, b) will be equal to (b, a). But it will not be equal to itself. That's very strange, and probably not what you intended.


    The removeDuplicates method is overcomplicated. For example converting an array list to a linked list is unnecessary. Here's a much simpler algorithm:

    • For each value in the source list
      • If the value does not exist in the result list, add it to the result list

    That's it.

    List<DataClass> result = new ArrayList<>();
    
    for (DataClass item : dataList) {
        if (!result.contains(item)) {
            result.add(item);
        }
    }
    return result;
    

    This assumes that the implementation of the equals method is fixed. Otherwise the result.contains step will not work correctly.

    Also note that result.contains performs a linear search: it checks every item until it finds a match. You could improve performance by using a set.