Search code examples
elasticsearchluceneelasticsearch-2.0

Possible to index 1M docs/sec in ElasticSearch?


I am trying to optimize indexing speed in ElasticSearch, as we are constantly reindexing indexes every hour, and so the faster we are able to re-index our data, the less of a lag we can achieve.

I came across this article which talks about reaching a re-indexing throughput of 100K: https://thoughts.t37.net/how-we-reindexed-36-billions-documents-in-5-days-within-the-same-elasticsearch-cluster-cd9c054d1db8#.4w3kl9ebf, and this StackOverflow question which achieves higher: ElasticSearch - high indexing throughput.

My question is whether it is possible to achieve a sustained indexing throughput of 1 million documents per second, and if so, how?


Solution

  • It will depend on a few factors, but why should it be impossible? Here are a few key factors, that will speed up the indexing process:

    • size of the documents (smaller is faster)
    • number of cores and size of memory (more is faster)
    • number of machines (more is faster)
    • number of replicas (fewer is faster)

    As an example, with small documents and a single eight core machine, I was able to index at about 70k-120k docs/s. Throw in a few more cores or machines and you could approach 1M docs/s.


    Update: Another test run with Elasticsearch 6.1.0, on a single 32-core E5, with 64G JVM heap. Here, esbulk could index about 330000 docs/s, using 10M small documents of sizes 20-40 bytes.


    Disclaimer: I wrote esbulk. The README contains a few measurements - maximum at the moment is at about 300k docs/s.