I'm currently deciding between MongoDB and Elasticsearch as a backend to a logging and analytics platform. I plan to use a cluster of 5 Intel Xeon Quad Core servers with 64GB RAM and a 500GB NVMe drive in each. With 1 replica set, it should support 1TB+ of data I'm guessing.
From what I've read on Elasticsearch, the recommended set-up for the above servers would be 5-10 shards, but shards cannot be increased in the future without a huge migration. So maybe I can add 5 more servers/nodes to the cluster for the same index, but not 10 or 20, because I can't create more shards to spread across the new nodes/servers - correct?
MongoDB appears to automatically manage sharding based on a key value and redistribute those shards as more nodes get added. So does that mean that I can add 50 more servers to the cluster in the future and MongoDB will happily spread the data from this one index across all the servers?
I basically only need 1TB of storage right now, but don't want to paint myself into a corner, should this 1 dataset end up growing to 100TB.
Without starting Elasticsearch with 100 shards at the beginning, which seems inefficient and bad practice, how can it scale past 5/10 servers for this single dataset?
_split
API, which does exactly what you are asking for: Use 5 shards initially, but have the option to go up to any factor of 20 (just as an example) — so 5 -> 10 -> 30 would be options.PS disclaimer: I work for Elastic now, but I have used MongoDB in production for 5 years as well.