Imagine you had a large CSV file - let's say 1 billion rows.
You want each row in the file to become a document in elastic search.
You can't load the file into memory - it's too large so has to be streamed or chunked.
The time taken is not a problem. The priority is making sure ALL data gets indexed, with no missing data.
What do you think of this approach:
Part 1: Prepare the data
Part 2: Upload the data
The steps you've mentioned above looks good. A couple of other things which will make sure ES does not get under load:
refresh_interval
is set to a very great value. This will ensure that the documents are not indexed very frequently. IMO the default value will also do. Read more here.As the above comment suggests, it'd be better if you start with a smaller batch of data. Of-course, if you use constants instead of hardcoding the values, your task just got easier.