Search code examples
solrluceneapache-zookeepersolrcloud

Adding Zookeeper to existing Solr


I have an existing Solr setup, running on a standalone Solr instance. I have been asked to add resilience and high availability to this setup. So I would like to add replication to my setup, for which I believe SolrCloud is the way to go?

I have run through the demo's on the SolrCloud wiki. However I am not sure, how to add my existing Solr instance to ZooKeeper and then add some more nodes for it to replicate to. Is this possible without re-bulking?

The wiki states

NOTE: When you are not using an example to start solr, make sure you upload the configuration set to zookeeper before creating the collection.

However I am unsure which files it is referring to and how to do this?

Cuurent setup info:

  • Solr 4.5.1
  • 2vCPU's 24GB RAM
  • 66 million docs in index
  • 58Gb index size
  • Bulk index time ~50 hours
  • 4000 max users
  • 400 average concurrent users
  • 20k updates per day
  • User searching via solrJ application
  • Querying involves grouping

Wish list

  • Existing Solr Index replicated to 2 new nodes
  • 3 Zookeeper nodes to provide resilience

What I have tried:

  • Download Zookeeper, run zkServer start with default settings -OK
  • Start existing solr setup with option -DzkHost=actualhostname:2181

But I recieve an error from solr "Could not load SOLR configuration".

So I guess my question summarises to:

  1. For my setup is SolrCloud the way to go rather than say ReplicationHandler?
  2. Is it possible to add solrCloud and ZK support without re-indexing (50hrs is a long time)?
  3. Which config files am I supposed to be adding to zk and how?
  4. Am I correct that without additional config changes sharding is not an option because I am using grouping in my queries?
  5. Should I upgrade from solr 4.5.1 if so how far?
  6. Most importantly, does my "Wish list" look like a good idea/bad idea/moon on a stick? If good, how to achieve it? If bad, an suggestions?

I am pretty new to Solr (~12 months use) and very new to Zookeeper and SolrCloud (~2 weeks reading/experimenting), so any advice on achieving the above would be very much appreciated.


Solution

    • For my setup is SolrCloud the way to go rather than say ReplicationHandler?

    SolrCloud is the way forward with Solr, so I'd say yes.

    • Is it possible to add solrCloud and ZK support without re-indexing (50hrs is a long time)?

    If you don't use sharding, only replicas, no need to reindex.

    • Which config files am I supposed to be adding to zk and how?

    Start your fist Solr with -Dbootstrap_conf=true, this will load your config files into ZK.

    • Am I correct that without additional config changes sharding is not an option because I am using grouping in my queries?

    Depends on what exactly you do with grouping see https://wiki.apache.org/solr/DistributedSearch for what's supported or not.

    • Should I upgrade from solr 4.5.1 if so how far?

    Upgrading to the latest version is a good idea, although past Solr 4.7, you will need Java 7.

    • Most importantly, does my "Wish list" look like a good idea/bad idea/moon on a stick? If good, how to achieve it? If bad, an suggestions?

    I vote for good idea, I have a similar one.