Search code examples
cluster-computingreplicationshardingsolrcloud

Why use shards when there is replicas


I am using Solr and have a single collection with dynamic fields.

My goal is to setup a SolrCloud, and SolrWiki recommend this: https://wiki.apache.org/solr/SolrCloud#Example_C:_Two_shard_cluster_with_shard_replicas_and_zookeeper_ensemble

From my understanding replications gives you load balancing and redundancy, since it is a straight copy. Sharding gives you load balancing and acquires half the memory for the index but you are dependent of both working.

So when they set up the cluster like this with 4 servers, would the requests be approximately 4 times faster? If you only have 1 shard with 4 replicas, does it get 4 times faster with more redundancy?

I took for granted that there is no point in having virtual servers because it wouldn't give you more CPUs to work simultaneously.


Solution

  • In SolrCloud adding more replicas improves concurrency and adding more shards improves query response time. In other words, if your original query returned in 1 second, adding more replicas will probably not improve the response time but will give you more results per time period. But, splitting your index into more shards will defiantly reduce the response time.

    So, if you split you index from 1 shard into 4 shards you will get almost 4 times faster queries. But if you choose to have 1 shard with 4 replicas your query response time will probably improved only slightly.