Possible to index 1M docs/sec in ElasticSearch?

I am trying to optimize indexing speed in ElasticSearch, as we are constantly reindexing indexes every hour, and so the faster we are able to re-index our data, the less of a lag we can achieve.

I came across this article which talks about reaching a re-indexing throughput of 100K: https://thoughts.t37.net/how-we-reindexed-36-billions-documents-in-5-days-within-the-same-elasticsearch-cluster-cd9c054d1db8#.4w3kl9ebf, and this StackOverflow question which achieves higher: ElasticSearch - high indexing throughput.

My question is whether it is possible to achieve a sustained indexing throughput of 1 million documents per second, and if so, how?

Solution

It will depend on a few factors, but why should it be impossible? Here are a few key factors, that will speed up the indexing process:

size of the documents (smaller is faster)
number of cores and size of memory (more is faster)
number of machines (more is faster)
number of replicas (fewer is faster)

As an example, with small documents and a single eight core machine, I was able to index at about 70k-120k docs/s. Throw in a few more cores or machines and you could approach 1M docs/s.

Update: Another test run with Elasticsearch 6.1.0, on a single 32-core E5, with 64G JVM heap. Here, esbulk could index about 330000 docs/s, using 10M small documents of sizes 20-40 bytes.

Disclaimer: I wrote esbulk. The README contains a few measurements - maximum at the moment is at about 300k docs/s.