Search code examples
javajava-stream

Java Stream distinct() with other chained operations that modify stream elements generates duplicate result


I was testing some things with Java Stream and distinct() and I came across a scenario where if I change a object of the Stream after executing distinct(), the final result of the execution contains duplicate items.

Does it make sense the object to go to the next operation before the distinct() is finalized? How to ensure uniqueness without iterating through the entire list?

OBS: The Lombok: @Data annotation adds @EqualsAndHashCode in the Dto class which will automatically generate equals() and hashCode() methods!

package br.com.marcusvoltolim;

import lombok.AllArgsConstructor;
import lombok.Data;
import reactor.core.publisher.Flux;

import java.util.*;
import java.util.stream.Collectors;

@Data
@AllArgsConstructor
class Dto {

    private Long id;

}

public class Main {

    public static void main(String[] args) {
        Dto dto0 = new Dto(0L);
        Dto dto1 = new Dto(1L);
        Dto dto2 = new Dto(1L);

        List<Dto> list = Arrays.asList(dto0, dto1, dto2);

        System.out.println("Original list: " + list);
        System.out.println("List with only distinct: " + list.stream().distinct().collect(Collectors.toList()));

        List<Dto> streamList = list.stream()
            .distinct()
            .map(dto -> {
                if (dto.getId() == 1L) {
                    dto.setId(3L);
                }
                return dto;
            })
            .collect(Collectors.toList());

        System.out.println("Java Stream  with map after distinct: " + streamList);
    }

}

Result:

Original list: [Dto(id=0), Dto(id=1), Dto(id=1)]
List with only distinct: [Dto(id=0), Dto(id=1)]
Java Stream with map after distinct: [Dto(id=0), Dto(id=3), Dto(id=3)]

I expected the result: [Dto(id=0), Dto(id=3)]


Solution

  • To get your "expected" behaviour you can't change the values for the list element in the list.

    You have:

    if (dto.getId() == 1L) {
        dto.setId(3L); // <--- Changing the contents
    }
    

    So you are chaning the value and the "equals" of the nodes.

    For it to work correctly, you have to really MAP and not MODIFY:

    if (dto.getId() == 1L) {
        return new Dto(3L); // <--- Map to new object
    } else {
        return new Dto(dto.getId()); // <--- Map to new object
    }