I'm working with a really large dataset and I need to clean (remove) some properties of some documents, and immediately after to add such a property to other documents. Sometimes, the documents with the property being removed are the ones that I should update after. The problem is that sometimes there is a ConflictError, and I wonder how can I wait for the first query to be fully executed to later execute the second one. This is the code I'm using:
ubq = UpdateByQuery(using=self.es, index=self.index).update_from_dict(query1).script(source=script_remove_source)
ubq.execute()
ubq = UpdateByQuery(using=self.es, index=self.index).update_from_dict(query2).script(source=script_add_source)
ubq.execute()
Any idea?
In the elastic docs they mention the param wait_for_completion, but they don't present an example of use. And anyway, that's not the Elasticsearch DSL. I read the DSL docs but nothing is sayd about sync or async.
What I'm doing right now is putting a sleep in between of 3 seconds... And it works, but that's completely awful.
Thanks in advance!
I finally managed it with retry_on_conflict:
es.update(
index=index,
doc_type=doc_type,
id=id_str,
body={"doc": {
session: state
}},
retry_on_conflict=5
)