I need to extract some data using a query from one index in ElasticSearch (ES) and copy them in a temporary index in the same cluster. Then delete the index where I extracted the data from and rename the temporary one to the name of the deleted index.
I know that latest versions of ES provide a built-in delete-by-query
but I'm tied to ES v2.3.5 which provide same functionality by installing delete-by-query
plugin, but the problem is we have more than 20 nodes where we will have to install the plugin and then make a full cluster restart which we want to avoid.
After some research here in SO and by googling, I found an interesting scripting tool, ElasticDump
And also it seems interesting ElasticSearch-Exporting
But I would like some opinions from someone who already used some of them or some other interesting options.
I have to do the same operation for around 100 indexes: extract data for index1 -> copy those data to a temporary index -> delete index1 - rename temporary index to index1, so tools which can automatize the process are welcome. Anyway, I know that I could create a bash script using for instance, ElasticDump
to repeat the process in each of those 100 indexes.
Thanks in advance
Use the _reindex API to create the new index.
Create an alias on the old index and use that alias in your applications. When you want to create the new index, just reindex the old one into a new index with a different name. Once the _reindex completes, you can remove the alias from the old index and add it on the new index, in one atomic operation. Like so:
curl -XPOST 'http://localhost:9200/_aliases' -d '
{
"actions" : [
{ "remove" : { "index1" : "test1", "alias" : "alias1" } },
{ "add" : { "index2" : "test1", "alias" : "alias1" } }
]
}'
This should ensure that you don't have a downtime during the index switching process. After switching the alias, you can delete the old index whenever you want. See: https://www.elastic.co/guide/en/elasticsearch/guide/current/index-aliases.html