java spring spring-boot elasticsearch spring-data-elasticsearch

How to find and mark duplicates in Elasticsearch

I have two ES indices which contain data about people (name, birth_date etc). There are people which are present in both indices, for example:

index1

_id	first_name	last_name	birth_date
qqwew	demo	demo	1998.10.10
etroty	demo2	demo2	1995.11.11
werewr	demo3	demo3	1997.09.09

index2

_id	first_name	last_name	birth_date
sdfll	demo514	demo514	2001.11.04
fdgdg	demo2	demo2	1995.11.11
sdfdfg	demo512	demo512	2000.05.16

As you can see, this entry is contained in both indices (compared by first_name, last_name & birth_date):

_id	first_name	last_name	birth_date	...
id is different	demo2	demo2	1995.11.11

I need to find such entries and add an additional field with unique id in it, so index1 & index2 should look like this afterwards:

index1

_id	first_name	last_name	birth_date	unique_id
qqwew	demo	demo	1998.10.10	null
etroty	demo2	demo2	1995.11.11	QWERTY
werewr	demo3	demo3	1997.09.09	null

index2

_id	first_name	last_name	birth_date	unique_id
sdfll	demo514	demo514	2001.11.04	null
fdgdg	demo2	demo2	1995.11.11	QWERTY
sdfdfg	demo512	demo512	2000.05.16	null

My data comes as CSV files which are parsed & imported into ES (via Java). I'm not sure on which stage I should do things like this or whether it's even possible with ES

Solution

For those who wondering how I solved this - I did not. The best solution is hashing, but it does not completely suit to my needs.