I've been tasked to do an ETL job to get JSON data out of Elasticsearch into an Azure Blob. I've set up a batch job for current indexes using elasticsearch-py
's search
, search_after
and pit
for active Indices. We are running on ES 7x but up until recently, we've been running on ES 5x and stored a snapshot of all indices before deleting them from the cluster. I need to get the historical data and have access to the S3 bucket where the team stored the snapshots.
The question is: Without having to set up a separate 5x cluster, restoring the snapshots and running the batch extract from there, is there an efficient method (maybe a python package) that will allow me to read the indices stored on the S3 bucket and extract the data directly?
Closing this question as the answer is NO (at this time). I've just restored all snapshots on separate VM's and pulled the data out.