I just want to understand query flow and how load balancing works in case of LBHttpSolrServer. We have setup SolrCloud with one collection, and that collection has 4 shards and each shard has two nodes i.e one master and one replica.
I have configured LBHttpSolrServer as below.
SolrServer lbHttpSolrServer = new LBHttpSolrServer("shard1_master:8080/solr/","shard2_master:8080/solr/","shard3_master:8080/solr/","shard4_master:8080/solr/","shard1_replica:8080/solr/","shard2_replica:8080/solr/","shard3_replica:8080/solr/","shard4_replica:8080/solr/",);
From my understanding solr and solrj works as below,
Here my confusion is at point number 4, is my understanding correct? if not please correct. And do i need to pass all 8 nodes to LBHttpSolrServer or just 4 will be sufficient .
Yes, that is correct. But instead of using LBhttpSolrServer you can use SolrCloudServer which is cloud aware.
CloudSolrServer will automatically load balance requests across the nodes that comprise the collection that is being queried. Newer versions of the client will also route updates directly to the leader of the correct shard, which reduces load on the servers and speeds up indexing.
Internally, CloudSolrServer uses an instance of LBHttpSolrServer, but the list of URLs is dynamically managed, your program doesn't need to worry about it.