I have two ES indices which contain data about people (name, birth_date etc). There are people which are present in both indices, for example:
index1
_id | first_name | last_name | birth_date | ... |
---|---|---|---|---|
qqwew | demo | demo | 1998.10.10 | |
etroty | demo2 | demo2 | 1995.11.11 | |
werewr | demo3 | demo3 | 1997.09.09 |
index2
_id | first_name | last_name | birth_date | ... |
---|---|---|---|---|
sdfll | demo514 | demo514 | 2001.11.04 | |
fdgdg | demo2 | demo2 | 1995.11.11 | |
sdfdfg | demo512 | demo512 | 2000.05.16 |
As you can see, this entry is contained in both indices (compared by first_name, last_name & birth_date):
_id | first_name | last_name | birth_date | ... |
---|---|---|---|---|
id is different | demo2 | demo2 | 1995.11.11 |
I need to find such entries and add an additional field with unique id in it, so index1 & index2 should look like this afterwards:
index1
_id | first_name | last_name | birth_date | unique_id |
---|---|---|---|---|
qqwew | demo | demo | 1998.10.10 | null |
etroty | demo2 | demo2 | 1995.11.11 | QWERTY |
werewr | demo3 | demo3 | 1997.09.09 | null |
index2
_id | first_name | last_name | birth_date | unique_id |
---|---|---|---|---|
sdfll | demo514 | demo514 | 2001.11.04 | null |
fdgdg | demo2 | demo2 | 1995.11.11 | QWERTY |
sdfdfg | demo512 | demo512 | 2000.05.16 | null |
My data comes as CSV files which are parsed & imported into ES (via Java). I'm not sure on which stage I should do things like this or whether it's even possible with ES
For those who wondering how I solved this - I did not. The best solution is hashing, but it does not completely suit to my needs.