Search code examples
javaamazon-web-serviceselasticsearchreindex

Server error while performing Elasticsearch Reindex in place operation


I am using, AWS Elasticsearch service(version 6.3). I am interested in changing mapping while re-indexing data from current_index to new_index. I am not trying to upgrade from older Elasticsearch clusters to new one. Both my current_index and new_index are on the same Elasticsearch 6.3 cluster.
I am trying to perform Reindex in place operation by following the information from Elastic documentation
My index contains about 250k searchable documents. When I POST _reindex request using curl,

curl -X POST "aws_elasticsearch_endpoint/_reindex" -H 'Content-Type: application/json' -d'
{
  "source": {
    "index": "current_index"
  },
  "dest": {
    "index": "new_index"
  }
}
'

Elasticsearch starts the reindex process(I verify this by performing GET /_cat/indices?v), and I end up getting curl: (56) Unexpected EOF error. The Reindex operation actually works fine. After about 2 hours the doc.count in new_index matches that of current_index and status turns green


If I POST _reindex from Java, I get this error:

java.net.SocketException: Unexpected end of file from server

Only when the document size in my index is small(I tried with like 1k searchable documents) is when the Reindex API returns success-fully as specified here


Solution

  • This is because the response takes a long time to return and curl times out. On small data sets, the response comes back before the time out, hence why you're getting a response.

    When curl times out, the reindex is still in progress, though, and you can still see how the reindex is doing using this command:

    GET _tasks?actions=*reindex&detailed=true
    

    What you can also do is to add ...?wait_for_completion=false to your curl command. ES will create a background task for your reindex operation. The curl command will terminate early and return a taskId that you can then use to regularly check the state of the reindex using the Task API

    GET .tasks/task/<taskId>
    

    Also note that in this case, when the task is done, you'll also need to remove the task from the .tasks index, ES will not do it for you.