Search code examples
c#amazon-web-serviceselasticsearchnest

Elasticsearch: Waiting for Long Running requests to complete


What is the best approach for knowing when a long running Elasticsearch request is complete?

Today I have a process that periodically purges ~100K documents from an AWS hosted ES that contains a total of ~60M documents.

var settings = new ConnectionSettings(new Uri("https://mycompany.es.aws.com"));
settings.RequestTimeout(TimeSpan.FromMinutes(3)); // not sure this helps

var client = new ElasticClient(settings);

var request = new DeleteByQueryRequest("MyIndex") { ... };

// this call will return an IsValid = true, httpstatus = 504 after ~60s,
var response = await client.DeleteByQueryAsync(request);

Even with timeout set to 3 minutes, the call always returns in ~60s with an empty response and a 504 status code. Though through Kibana, I can see that the delete action continues (and properly completes) over the next several minutes.

Is there a better way to request and monitor (wait for completion) a long running ES request?

UPDATE

Based on Simon Lang's response I updated my code to make use of ES Tasks. The final solution looks something like this...

var settings = new ConnectionSettings(new Uri("https://mycompany.es.aws.com"));
settings.RequestTimeout(TimeSpan.FromMinutes(3)); // not sure this helps

var client = new ElasticClient(settings);

var request = new DeleteByQueryRequest("MyIndex") 
{
  Query = ...,
  WaitForCompletion = false
};

var response = await client.DeleteByQueryAsync(request);

if (response.IsValid)
{
  var taskCompleted = false;
  while (!taskCompleted)
  {
    var taskResponse = await client.GetTaskAsync(response.Task);
    taskCompleted = taskResponse.Completed;

    if (!taskCompleted)
    {
      await Task.Delay(5000);
    }
  }
}

Solution

  • I agree with @LeBigCat that the timeout comes from AWS and it is not a NEST problem.

    But to address your question: The _delete_by_query request supports the wait_for_completion parameter. If you set it to false, the request returns immediately with a task id. You then can request the task status by the task api.

    https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-delete-by-query.html