I need to update a field of a doc in Elasticsearch and add the count of that doc in a list inside python code. The weight
field contains the count of the doc in a dataset. The dataset needs to be updated from time to time.So the count of each document must be updated too. hashed_ids
is a list of document ids that are in the new batch of data. the weight
of matched id must be increased by the count of that id in hashed_ids
.
I tried the code below but it does not work.
hashed_ids = [hashlib.md5(doc.encode('utf-8')).hexdigest() for doc in shingles]
update_with_query_body = {
"script": {
"source": "ctx._source.content_completion.weight +=param.count",
"lang": "painless",
"param": {
"count": hashed_ids.count("ctx.['_id']")
}
},
"query": {
"ids": {
"values": hashed_ids
}
}
}
for example let say a doc with id
=d1b145716ce1b04ea53d1ede9875e05a
and weight
=5 is already present in index. and also the string d1b145716ce1b04ea53d1ede9875e05a
is repeated three times in the hashed_ids
so the update_with_query
query shown above will match the doc in database. I need to add 3 to 5 and have 8 as final weight
I'm not aware of python but here is an e.g. based solution with a few assumptions.
Let's say the following is the hashed_ids
extracted:
hashed_ids = ["id1","id1","id1","id2"]
To use it in terms query we can get just the unique list of ids, i.e.
hashed_ids_unique = ["id1", "id2"]
Lets assume the doc(s) are indexed with below structure:
PUT test/_doc/1
{
"id": "id1",
"weight":9
}
Now we can use update by query as below:
POST test/_update_by_query
{
"query":{
"terms": {
"id":["id1","id2"]
}
},
"script":{
"source":"long weightToAdd = params.hashed_ids.stream().filter(idFromList -> ctx._source.id.equals(idFromList)).count(); ctx._source.weight += weightToAdd;",
"params":{
"hashed_ids":["id1","id1","id1","id2"]
}
}
}
Explanation for script:
The following gives the count of matching ids in the hashed_ids
list for the id
of the current matching doc.
long weightToAdd = params.hashed_ids.stream().filter(idFromList -> ctx._source.id.equals(idFromList)).count();
The following adds up the weightToAdd
to the existing value of weight
in the document.
ctx._source.weight += weightToAdd;