Search code examples
pythonelasticsearchelasticsearch-dsl

Wait for completion of UpdateByQuery with the elasticsearch DSL


I'm working with a really large dataset and I need to clean (remove) some properties of some documents, and immediately after to add such a property to other documents. Sometimes, the documents with the property being removed are the ones that I should update after. The problem is that sometimes there is a ConflictError, and I wonder how can I wait for the first query to be fully executed to later execute the second one. This is the code I'm using:

ubq = UpdateByQuery(using=self.es, index=self.index).update_from_dict(query1).script(source=script_remove_source)
ubq.execute()

ubq = UpdateByQuery(using=self.es, index=self.index).update_from_dict(query2).script(source=script_add_source)
ubq.execute()

Any idea?

In the elastic docs they mention the param wait_for_completion, but they don't present an example of use. And anyway, that's not the Elasticsearch DSL. I read the DSL docs but nothing is sayd about sync or async.

What I'm doing right now is putting a sleep in between of 3 seconds... And it works, but that's completely awful.

Thanks in advance!


Solution

  • I finally managed it with retry_on_conflict:

    es.update(
                    index=index,
                    doc_type=doc_type,
                    id=id_str,
                    body={"doc": {
                        session: state
                    }},
                    retry_on_conflict=5
                )