I am running an update by query as a task (wait_for_completion=false), with 'conflicts=proceed'. I do expect version conflicts to happen sometimes and can see that info in get task response (/task/task-id). I plan to reprocess the conflicted records.
Problem: When version conflicts do happen, I don't see the conflicted record ids under 'failures' array for me to be able to reprocess. Any suggestions in this regard are greatly appreciated.
"response" : {
"took" : 69055,
"timed_out" : false,
"total" : 286164,
"updated" : 285885,
"created" : 0,
"deleted" : 0,
"batches" : 287,
"version_conflicts" : 279,
"noops" : 0,
"retries" : {
"bulk" : 0,
"search" : 0
},
"throttled" : "0s",
"throttled_millis" : 0,
"requests_per_second" : -1.0,
"throttled_until" : "0s",
"throttled_until_millis" : 0,
"failures" : [ ]
}
Note: Another observation is that if we run this with 'conflicts=abort', then we see failure info in the response as expected.
Below is the update by query template being used,
{
"conflicts": "proceed",
"query": {
"term": {
"location_id": {
"value": 121
}
}
},
"script": {
"params": {},
"source": "ctx._source.sys_updated_at = new Date();ctx._source.location_name = 'New York';"
}
}
What you can do is to modify your query to add a constraint that checks whether location_name
is not New York
, because the documents that conflicted didn't have this field updated.
So on the next run, only those that conflicted would be updated, and you can re-run a few times until you get no conflicts.
This is how the query should look like this (maybe your field is called location_name.keyword
instead):
POST your-index/_update_by_query
{
"conflicts": "proceed",
"query": {
"bool": {
"filter": [
{
"term": {
"location_id": {
"value": 121
}
}
}
],
"must_not": [
{
"term": {
"location_name": "New York"
}
}
]
}
},
"script": {
"params": {},
"source": "ctx._source.sys_updated_at = new Date();ctx._source.location_name = 'New York';"
}
}