Search code examples
elasticsearch

Update by query (async / task) - No failure info - Handling Conflicts


I am running an update by query as a task (wait_for_completion=false), with 'conflicts=proceed'. I do expect version conflicts to happen sometimes and can see that info in get task response (/task/task-id). I plan to reprocess the conflicted records.

Problem: When version conflicts do happen, I don't see the conflicted record ids under 'failures' array for me to be able to reprocess. Any suggestions in this regard are greatly appreciated.

"response" : {
    "took" : 69055,
    "timed_out" : false,
    "total" : 286164,
    "updated" : 285885,
    "created" : 0,
    "deleted" : 0,
    "batches" : 287,
    "version_conflicts" : 279,
    "noops" : 0,
    "retries" : {
      "bulk" : 0,
      "search" : 0
    },
    "throttled" : "0s",
    "throttled_millis" : 0,
    "requests_per_second" : -1.0,
    "throttled_until" : "0s",
    "throttled_until_millis" : 0,
    "failures" : [ ]
  }

Note: Another observation is that if we run this with 'conflicts=abort', then we see failure info in the response as expected.

Below is the update by query template being used,

{
    "conflicts": "proceed",
    "query": {
        "term": {
            "location_id": {
                "value": 121
            }
        }
    },
    "script": {
        "params": {},
        "source": "ctx._source.sys_updated_at = new Date();ctx._source.location_name = 'New York';"
    }
}


Solution

  • What you can do is to modify your query to add a constraint that checks whether location_name is not New York, because the documents that conflicted didn't have this field updated.

    So on the next run, only those that conflicted would be updated, and you can re-run a few times until you get no conflicts.

    This is how the query should look like this (maybe your field is called location_name.keyword instead):

    POST your-index/_update_by_query
    {
      "conflicts": "proceed",
      "query": {
        "bool": {
          "filter": [
            {
              "term": {
                "location_id": {
                  "value": 121
                }
              }
            }
          ],
          "must_not": [
            {
              "term": {
                "location_name": "New York"
              }
            }
          ]
        }
      },
      "script": {
        "params": {},
        "source": "ctx._source.sys_updated_at = new Date();ctx._source.location_name = 'New York';"
      }
    }