Search code examples
mongodbelasticsearchscaling

Choosing between MongoDB and ElasticSearch - Scaling/Sharding


I'm currently deciding between MongoDB and Elasticsearch as a backend to a logging and analytics platform. I plan to use a cluster of 5 Intel Xeon Quad Core servers with 64GB RAM and a 500GB NVMe drive in each. With 1 replica set, it should support 1TB+ of data I'm guessing.

From what I've read on Elasticsearch, the recommended set-up for the above servers would be 5-10 shards, but shards cannot be increased in the future without a huge migration. So maybe I can add 5 more servers/nodes to the cluster for the same index, but not 10 or 20, because I can't create more shards to spread across the new nodes/servers - correct?

MongoDB appears to automatically manage sharding based on a key value and redistribute those shards as more nodes get added. So does that mean that I can add 50 more servers to the cluster in the future and MongoDB will happily spread the data from this one index across all the servers?

I basically only need 1TB of storage right now, but don't want to paint myself into a corner, should this 1 dataset end up growing to 100TB.

Without starting Elasticsearch with 100 shards at the beginning, which seems inefficient and bad practice, how can it scale past 5/10 servers for this single dataset?


Solution

    1. As Val said, you would normally have time based indices, so you can easily (in a performant way) remove data after a certain retention period. So as your requirements change over time, you change your shard number (normally through an index template).
    2. Current versions of Elasticsearch now support a _split API, which does exactly what you are asking for: Use 5 shards initially, but have the option to go up to any factor of 20 (just as an example) — so 5 -> 10 -> 30 would be options.
    3. If you have 5 primary shards and a replication factor of 1, you could still spread out the load over 10 nodes: Writes to the 5 primary and 5 replica shards; reads will go to either one of them. Elasticsearch's write / read model is generally different than MongoDB's.

    PS disclaimer: I work for Elastic now, but I have used MongoDB in production for 5 years as well.