Search code examples
javajava-streamdistinct-values

How does java.util.stream.Stream.distinct() method work? Can I override the equals() method of the stream of the objects?


My use case is that I am trying to use the distinct method of Stream to remove students with the same roll number from a list of objects of class StudentCourseMapping.

POJO details are below:

public class StudentCourseMapping implements Serializable{
    private String name;
    private String dept;
    private Integer roll;
    private String course;

Below is the equals method:

    @Override
    public boolean equals(Object obj) {
        StudentCourseMapping other = (StudentCourseMapping) obj;
        if (roll == null) {
            if (other.roll != null)
                return false;
        } else if (!roll.equals(other.roll))
            return false;
        return true;
    }

Below is the implementation:

public class RemoveDuplicateUsingStream {
    public static void main(String[] args) {
        List<StudentCourseMapping> studentCourceList = JacksonJSONReaderObjectMapper.jsonReader();
        
        studentCourceList.stream().distinct().forEach(System.out::println);
        StudentCourseMapping s0 = studentCourceList.get(0);
        StudentCourseMapping s1 = studentCourceList.get(1);
        System.out.println(s0.equals(s1));

        Set<Integer> st = new HashSet();
        List<StudentCourseMapping>studentCourceList2 = studentCourceList.stream().filter(s -> st.add(s.getRoll()))
                .collect(Collectors.toCollection(ArrayList::new));
        System.out.println(studentCourceList2.size());
    }
}

And the output is

StudentCourseMapping [name=Alu, dept=Physics, roll=12, course=Quantum Theory]
StudentCourseMapping [name=Alu, dept=Physics, roll=12, course=English]
StudentCourseMapping [name=Sam, dept=Commerce, roll=16, course=English]
StudentCourseMapping [name=Sam, dept=Commerce, roll=16, course=Accounts]
StudentCourseMapping [name=Joe, dept=Arts, roll=19, course=English]
StudentCourseMapping [name=Joe, dept=Arts, roll=19, course=Hindi]
true
3

JacksonJSONReaderObjectMapper.jsonReader() is a custom method which reads the below JSON. I am able to achieve the same by using filter and adding to HashSet but I really want to know what is wrong with my distinct implementation.

{
    "studentCourseMapping": [
        {
            "name": "Alu",
            "dept": "Physics",
            "roll": 12,
            "course": "Quantum Theory"
        },
        {
            "name": "Alu",
            "dept": "Physics",
            "roll": 12,
            "course": "English"
        },
        {
            "name": "Sam",
            "dept": "Commerce",
            "roll": 16,
            "course": "English"
        },
        {
            "name": "Sam",
            "dept": "Commerce",
            "roll": 16,
            "course": "Accounts"
        },
        {
            "name": "Joe",
            "dept": "Arts",
            "roll": 19,
            "course": "English"
        },
        {
            "name": "Joe",
            "dept": "Arts",
            "roll": 19,
            "course": "Hindi"
        }
    ]
}

When I try to test the equals method directly it was working properly and returning true since both the s0 and s1 has roll as 12.

        StudentCourseMapping s0 = studentCourceList.get(0);
        StudentCourseMapping s1 = studentCourceList.get(1);
        System.out.println(s0.equals(s1));

But when I am using distinct all the objects are getting printed and also while trying to debug in Eclipse the distinct method I wrote is not getting called, but the documentation says it should be called. I am using JDK 11:

Stream<T> distinct()

Returns a stream consisting of the distinct elements (according to Object.equals(Object)) of this stream.


Solution

  • As @RealSkeptic and @Jesper have already said in the comments, you do have to override the hashCode method of your StudentCourseMapping class in order to properly compare the elements in the stream and keep only the distinct ones according to your equals implementation.

    Although, it is not mentioned in the distinct documentation, as it is considered implied and amply covered in the equals documentation, but when overriding the equals() method we must also override the hashcode() method in order to honor the general hashcode contract.

    Note that it is generally necessary to override the hashCode method whenever this method is overridden, so as to maintain the general contract for the hashCode method, which states that equal objects must have equal hash codes.

    In fact, your snippet produces the desired output once the hashCode() method is properly overridden.

    public class StudentCourseMapping implements Serializable {
        private String name;
        private String dept;
        private Integer roll;
        private String course;
    
        //... your class implementation ...
    
        @Override
        public boolean equals(Object obj) {
            if (obj == null) return false;
            if (obj == this) return true;
            if (obj.getClass() != getClass()) return false;
            StudentCourseMapping student = (StudentCourseMapping) obj;
            return Objects.equals(roll, student.roll);
        }
    
        @Override
        public int hashCode() {
            return Objects.hashCode(roll);
        }
    
        @Override
        public String toString() {
            return String.format("[name = %s, dept = %s, roll = %d, course = %s]", name, dept, roll, course);
        }
    }