Search code examples
solrsolrcloudsolr5

How to correctly add additional SOLR 5 (vm) nodes to SOLR Cloud


I have a SOLR / Zookeeper / Kafka setup. Each on separate VMs.

I have successfully run this all using two SOLR 4.9 vms (Ubuntu)

Now I wish to build two SOLR 5.4 vms and get it all working again.

Essentially, "Upgrade by Replacement"

I have "hacked" a solution to my problem but that makes me very nervous.

To begin, Zookeeper is running. I turn off my SOLR 4.9 vms and delete the config out of Zookeeper (not necessarily in that order... ;-) )

Now, I start up my 'solr5' VM (and SOLR in cloud mode) where I have installed SOLR 5.4 according to the "Production Install" instructions on the SOLR Wiki. I have also installed 5.4 on 'solr6', but it's not running yet.

I issue this command on the 'solr5' machine:

/opt/solr/bin/solr create -c fooCollection -d /home/john/conf -shards 1 -replicationFactor 1

and I get the following output:

Connecting to ZooKeeper at 192.168.56.5,192.168.56.6,192.168.56.7/solr ...
Re-using existing configuration directory statdx

Creating new collection 'fooCollection' using command:
http://localhost:8983/solr/admin/collections?action=CREATE&name=fooCollection&numShards=1&replicationFactor=1&maxShardsPerNode=1&collection.configName=fooCollection

{
  "responseHeader":{
    "status":0,
    "QTime":3822},
  "success":{"":{
      "responseHeader":{
        "status":0,
        "QTime":3640},
      "core":"fooCollection_shard1_replica1"}}}

Everything is working great. I turn on my microservice, and it pumps all my SOLR docs from Kafka into 'solr5'.

Now, I want to add 'solr6' to the collection. I can't find a way to do this besides my hack (which I'll describe later).

The command I used before to create a collection, errors out with the observation that my collection already exists.

There seems to be no zkcli.sh or solr command that will do what I want. None of the api commands seem to do this either.

Is there not a simple way to say to (SOLR? Zookeeper?) I want to add another machine to my SOLR nodes, please configure it like the first (solr5) and begin replicating data?

Maybe I should have had both machines running when I issued the create command?

I'd be grateful for some "approved" method for doing this since I need to come up with a "solution" to do the same kind of approach in Prod every time there is a need to upgrade SOLR.

Now for my hack. Keep in mind I'm now two days trying to find clear docs on this. No flames please, I totally get that this is not the way to do things. At least, I HOPE this is not the way to do things...

  1. Copy the fooCollection directory from where the create collection command put it on 'solr5' (which was /opt/solr/server/solr/fooCollection_shard1_replica1) to the same location on my 'solr6' VM.
  2. Make what changes seem logical to the collection directory name (becomes fooCollection_shard1_replica2)
  3. Make what changes seem logical in the core.properties file:

For reference, here's the core.properties file that was created by the create command.

#Written by CorePropertiesLocator
#Wed Jan 20 18:59:08 UTC 2016
numShards=1
name=fooCollection_shard1_replica1
shard=shard1
collection=fooCollection
coreNodeName=core_node1

Here is what the file looked like on 'solr6' when I was done hacking.

#Written by CorePropertiesLocator
#Wed Jan 20 18:59:08 UTC 2016
numShards=1
name=fooCollection_shard1_replica2
shard=shard1
collection=fooCollection
coreNodeName=core_node2

When I did this and rebooted 'solr6' everything appeared golden. The "Cloud" web page looked right in the Admin web page - and when I added documents to 'solr5' they were available in 'solr6' if I hit it directly from the Admin web pages.

I would be grateful if someone can tell me how to achieve this without a hack like this... or if this IS the right way to do this...

=============================

In answer to @Mani and the suggested procedure

Thanks Mani - I did try this very carefully following your steps.

In the end, I get this output from the collection status query:

    john@solr6:/opt/solr$ ./bin/solr healthcheck -z 192.168.56.5,192.168.56.6,192.168.56.7/solr5_4 -c fooCollection
{
  "collection":"fooCollection",
  "status":"healthy",
  "numDocs":0,
  "numShards":1,
  "shards":[{
      "shard":"shard1",
      "status":"healthy",
      "replicas":[{
          "name":"core_node1",
          "url":"http://192.168.56.15:8983/solr/fooCollection_shard1_replica1/",
          "numDocs":0,
          "status":"active",
          "uptime":"0 days, 0 hours, 6 minutes, 24 seconds",
          "memory":"31 MB (%6.3) of 490.7 MB",
          "leader":true}]}]}

This is the kind of result I've been finding in my experimentation all along. The core will get created on one of the SOLR VM's (the one I issue the command line to create the collection on) but I don't get anything created on the other VM -- which, based on your steps below, I believe you also thought should occur, yes?

Also, I'll note for anyone reading that in 5.4, the command is "healthcheck" and not healthstatus. The command line shows you immediately, so it's no big deal.

===============

Update 1 :: Manual add of 2nd core

If I go to the other VM and manually add the following:

sudo mkdir /opt/solr/server/solr/fooCollection_shard1_replica2
sudo mkdir /opt/solr/server/solr/fooCollection_shard1_replica2/data
nano /opt/solr/server/solr/fooCollection_shard1_replica2/core.properties
     (in here I add only collection=fooCollection and then save/close)

Then I reboot my SOLR server on that same VM: sudo /opt/solr/bin/solr restart -c -z zoo1,zoo2,zoo3/solr

I will find a second node magically appearing in my Admin console. It will be a "follower" (I.E. not the leader) and both will be branching off "shard1" in the cloud UI.

I don't know if this is "the way" but it's the only way I've found so far. I'm going to reproduce to that point and try with the Admin UI and see what I get. That would be a little easier for my IT guys when the time comes - if it works.

===============

Update 2 :: Slight modification of create command

@Mani -- I believe I have success following your steps - and like many things, it's simple once you understand.

I reset everything (deleted directories, cleared out zookeeper (rmr /solr) and re did everything from scratch.

I changed the "create" command slightly thus:

./bin/solr create -c fooCollection -d /home/john/conf -shards 1 -replicationFactor 2

Note the "replicationFactor 2" rather than 1.

Suddenly I did indeed have cores on both VMs.

A couple of notes:

I found that I couldn't get a happy result from the status call just by starting the SOLR 5.4 servers in Cloud mode with the Zookeeper IP addresses. The "node" in Zookeeper was not yet created.

The create command also failed at that point.

The way I found around this was to use the zkcli.sh to load the configs like this:

sudo /opt/solr/server/scripts/cloud-scripts/zkcli.sh -cmd upconfig -confdir /home/john/conf/ -confname fooCollection -z 192.168.56.5/solr

When I checked Zookeeper immediately after running this command, there was a /solr/configs/fooCollection "path".

NOW the create command works and I assume that if I had wanted to override the configs, I could have done so at that point although I haven't tried.

I'm not positive at what point, but it seems I needed to reboot the SOLR Servers (probably after the create command) in order to find everything on status etc... I may be misremembering that because I've been through it so many times. If in doubt after the create command, try a reboot of the servers. (This can be IP addresses or names that resolve correctly)

sudo /opt/solr/bin/solr restart -c -z zoo1,zoo2,zoo3/solr
sudo /opt/solr/bin/solr restart -c -z 192.168.56.5,192.168.56.6,192.168.56.7/solr

After doing these slight modifications to @Mani's recommended procedure, I get a Leader and a "follower" each on different VM's - in the /opt/solr/server/solr directory (fooCollection in this case) and I was able to send data in to one and search the other via the Admin console hitting the IP addresses.

=============

Variations

One thing anyone reading this may want to try is simply making another "node" in Zookeeper (solr5_4 for example).

I tried this and it works like a charm. Everywhere you see the /solr chroot associated with the Zookeeper ensemble, you could replace it with /solr5_4. This would allow the older SOLR VM's to keep functioning in Prod while you build out your new SOLR 5.4 "environment" and the same Zookeeper VM's could be used for both -- because a different chroot should guarantee no interaction or overlap.

Again, the "node" in Zookeeper won't be created until you do the config upload, but you need to start your SOLR process like this or you'd be in the wrong context later on. Note the "solr5_4" as the chroot.

sudo /opt/solr/bin/solr restart -c -z zoo1,zoo2,zoo3/solr5_4

Once done with testing, the solr5_4 "environment" becomes what matters for Prod and the SOLR 4.x VM's and Zookeeper "node" of solr can be removed. It should be a fairly simple matter to point a load balancer at the new SOLR VM's and do a switchover without users really even noticing.

This strategy will work for SOLR 6, 6.5, 7, and so on.

This command also worked to add the collections/cores. However, the solr server had to be running first.

http://192.168.56.16:8983/solr/admin/collections?action=CREATE&name=fooCollection&numShards=1&replicationFactor=2&collection.configName=fooCollection

==================

Use as Upgrade By Replacement

In case it's not obvious, this technique (especially if using the "new" chroot in Zookeeper of something like /solr5_4 or similar) gives you the luxury of leaving your older version of SOLR running for as long as you want. Allowing a re-indexing of all your data to take days if needed.

I haven't tried, but I'm guessing a backup of the index could be dropped into the new machines as well.

I just wanted readers to understand that this was an approach intended to make upgrades really low stress and straightforward. (Don't need to upgrade in place, just build new VMs and install latest version of SOLR.)

This would allow the switch-over to occur without affecting prod until you're ready to drop the hammer and re-direct your load balancer at the new SOLR ip addresses (Which you will have already tested of course...)

The one assumption here is that you have the resources to bring up a set of SOLR VMs or physical servers to match whatever you already have in Production. Obviously, if you're resource-limited to only the boxes or VMs you have, upgrade-in-place may be your only option.


Solution

  • This is how I would do it. I am assuming that you have the luxury of having downtime & have ability to completely reindex the documents. Since you are essentially upgrading from 4.9 to 5.4.

    • Stop the 4.9 solr nodes and uninstall solr.
    • Remove the config from zk nodes using zkcli.sh with the clear command.
    • Install the solr on both solr5 & solr6 vm
    • Start both the solr nodes and make sure both can talk to zk. => On solr5 vm ./bin/solr start -c -z zk1:port1,zk2:port1,zk3:port1 On solr6 vm ./bin/solr start -c -z zk1:port1,zk2:port1,zk3:port1
    • Verify the status of Solrcloud using ./bin/solr status => this should return liveNodes as 2
    • Now create the fooCollection using the CollectionsAPI from anyone of solr nodes. This uploads the configsets to zookeeper and also creates the collection => ./bin/solr create -c fooCollection -d /home/john/conf -shards 1 -replicationFactor 1

    • Verify the healthstatus of the fooCollection => ./bin/solr healthstatus -z zk1:port1,zk2:port1,zk3:port1 -c fooCollection

    • Now verify the config is present in Zookeeper by checking Solr-AdminConsole -> CloudSection -> Tree .. /configs
    • And also check the CloudSection -> Graph showing the active status on the nodes. That indicates that everything is good.
    • Now start pushing documents into the collection

    The below wiki is very helpful to do the above. https://cwiki.apache.org/confluence/display/solr/Solr+Start+Script+Reference