I am bulk indexing into Elasticsearch docs containing country shapes (files here), based on the cshapes dataset.
The geoshapes have a lot of points in "geometry":{"type":"MultiPolygon"
, and the bulk request takes a long time to complete (and sometimes does not complete, which is a separate and already reported problem).
Since the client times out (I use the official ES node.js), I would like to have a way to check what the status of the bulk request is, without having to use enormous timeout values.
What I would like is to have a status such as active/running, completed or aborted. I guess that just by querying the single doc in the batch would not tell me whether the request has been aborted.
Is this possible?
I'm not sure if this is exactly what you're looking for, but may be helpful. Whenever I'm curious about what my cluster is doing, I check out the tasks API.
The tasks API shows you all of the tasks that are currently running on your cluster. It will give you information about individual tasks, such as the task ID, start time, and running time. Here's the command:
curl -XGET http://localhost:9200/_tasks?group_by=parents | python -m json.tool