Search code examples
solrserverdistributed-computingshardingsolrcloud

Extending a solr collection across multiple machines


I am trying to set up a solr collection that extends across multiple servers. If I am correct in understanding things, I am able to set up a collection, which consists of shards. Those shards consist of replicas, which is correspond to cores. Please correct any holes in my understanding of this.

Ok.

So I've got solr set up and am able to create a collection on machine one by doing this.

bin/solr create_collection -c test_collection -shards 2 -replicationFactor 2 -d server/solr/configsets/basic_configs/conf

This appears to do something right, I am able to check the health and see something. I input

bin/solr healthcheck -c test_collection

and I see the shard information.

Now what I want to do, and this is the part I am stuck on, is to take this collection that I have created, and extend it across multiple servers. I'm not sure if I understand how this works correctly, but I think what I want to do is put shard1 on machine1, and shard2 on machine2.

I can't really figure out how to do this based on the documentation, although I am pretty sure this is what SolrCloud is meant to solve. Can someone give me a nudge in the right direction with this...? Either a way to extend the collection across multiple servers or a reason for not doing so.


Solution

  • When you say -shards 2, you're saying that you want your collection to be split across two servers already. -replicationFactor 2 says that you want those shards present on at least two servers as well.

    A shard is a piece of the collection - without a shard, you won't have access to all the documents. The replicationFactor indicates how many copies should be made available of the same shard (or "partition" which some times is used to represent the piece of the index) in the collection, so two shards with two replicas will end up with four "cores" distributed across the available servers (these "cores" are managed internally by Solr).

    Start a set of new SolrCloud instances in the same cluster and you should see that the documents are spread across your nodes as expected.