Search code examples
pythonelasticsearchelasticsearch-py

How to read snapshots in python?


I've been tasked to do an ETL job to get JSON data out of Elasticsearch into an Azure Blob. I've set up a batch job for current indexes using elasticsearch-py's search, search_after and pit for active Indices. We are running on ES 7x but up until recently, we've been running on ES 5x and stored a snapshot of all indices before deleting them from the cluster. I need to get the historical data and have access to the S3 bucket where the team stored the snapshots.

The question is: Without having to set up a separate 5x cluster, restoring the snapshots and running the batch extract from there, is there an efficient method (maybe a python package) that will allow me to read the indices stored on the S3 bucket and extract the data directly?


Solution

  • Closing this question as the answer is NO (at this time). I've just restored all snapshots on separate VM's and pulled the data out.