Search code examples
solrkubernetessolrcloud

Solr Cloud: Distribution of Shards across nodes


I'm currently using Solr Cloud 6.1, the following behavior can also be observed until 7.0.

I'm trying to create a Solr collection with 5 shards and a replication factor of 2. I have 5 physical servers. Normally, this would distribute all 10 replicas evenly among the available servers.

But, when starting Solr Cloud with a -h (hostname) param to give every Solr instance an individual, but constant hostname, this doesn't work any more. The distribution then looks like this:

solr-0:
wikipedia_shard1_replica1  wikipedia_shard2_replica1  wikipedia_shard3_replica2  wikipedia_shard4_replica1  wikipedia_shard4_replica2

solr-1:

solr-2:
wikipedia_shard3_replica1  wikipedia_shard5_replica1  wikipedia_shard5_replica2

solr-3:
wikipedia_shard1_replica2

solr-4:
wikipedia_shard2_replica2

I tried using Rule-based Replica Placement, but the rules seem to be ignored.

I need to use hostnames, because Solr runs in a Kubernetes cluster, where IP adresses change frequently and Solr won't find it's cores after a container restart. I first suspected a newer Solr version to be the cause of this, but I narrowed it down to the hostname problem.

Is there any solution for this?


Solution

  • The solution was actually quite simple (but not really documented):

    When creating a Service in OpenShift/Kubernetes, all matching Pods get backed by a load balancer. When all Solr instances get assigned an unique hostname, this hostnames would all resolve to one single IP address (that of the load balancer).

    Solr somehow can't deal with that and fails to distribute its shards evenly.

    The solution is to use headless services from Kubernetes. Headless services aren't backed by a load balancer and therefore every hostname resolves to an unique IP address.