Search code examples
javacsvjava-stream

Retrieving repeated records from a CSV using the Java Stream API


I'm a newbie to using the Java stream, but I understand that it's a replacement for a loop command. However, I would like to know if there is a way to filter a CSV file using stream, as shown below, where only the repeated records are included in the result and grouped by the Center field.

Initial CSV file

enter image description here

Final result

enter image description here

In addition, the same pair cannot appear in the final result inversely, as shown in the table below:

This shouldn't happen

enter image description here

Is there a way to do it using stream and grouping at the same time, since theoretically, two loops would be needed to perform the task?


Solution

  • What I understood from your examples is you consider an entry as duplicate if all the attributes have same value except the ID. You can use anymatch for this:

    list.stream().filter(x ->
                    list.stream().anyMatch(y -> isDuplicate(x, y))).collect(Collectors.toList())
    

    So what does the isDuplicate(x,y) do?

    This returns a boolean. You can check whether all the entries have same value except the id in this method:

    private boolean isDuplicate(CsvEntry x, CsvEntry y) {
        return !x.getId().equals(y.getId())
                && x.getName().equals(y.getName())
                && x.getMother().equals(y.getMother())
                && x.getBirth().equals(y.getBirth());
    }
    

    I've assumed you've taken all the entries as String. Change the checks according to the type. This will give you the duplicate entries with their corresponding ID