Search code examples
solrschemaconfigapache-zookeepersolrcloud

Existing SOLR collection not picking up Zookeeper schema change


I have a local solr cloud cluster running on three separate nodes: 33.33.3[3-5]:8080 This cluster is managed by a local 3 node zookeeper ensemble that lives at: 33.33.3[0-2]:2181

I am trying to experiment with schema modifications - however, I'm having trouble getting SOLR to pickup the new changes. Here is what I'm doing

First I upload one config set to zookeeper:

/opt/src/solr/scripts/cloud-scripts/zkcli.sh -zkhost 33.33.33.30:2181,33.33.33.31:2181,33.33.33.32:2181 -cmd upconfig -confdir /opt/src/solr/solr/conf/ -confname test_conf

Then I create a collection in SOLR:

http://33.33.33.33:8080/solr/admin/collections?action=CREATE&name=test_collection&numShards=1&replicationFactor=3

This all works fine. Since there is only one config in zookeeper, this is automatically mapped to the collection on creation. Pretty cool.

But now I want to modify the the schema for test_collection. So, I ssh into one of my SOLR boxes, browse to /opt/src/solr/solr/conf/ open schema.xml in vim, and remove a field. Then I upload the config again (using the same name so it overwrites the old config):

/opt/src/solr/scripts/cloud-scripts/zkcli.sh -zkhost 33.33.33.30:2181,33.33.33.31:2181,33.33.33.32:2181 -cmd upconfig -confdir /opt/src/solr/solr/conf/ -confname test_conf

Now I reload the core:

http://33.33.33.33:8080/solr/admin/collections?action=RELOAD&name=test_collection

And zookeeper picks up the changes. I can download the file from zookeeper and the changes are there. I can browse the config in SOLR admin (cloud>tree>configs>schema.xml AND test_collection>files>schema.xml) and the changes are reflected. However, if I hit this route: http://33.33.33.33:8080/solr/test_collection/schema/fields the field is still there. Also, if I go to test_collection>schema browser in the SOLR admin the field is still listed there as well.

What's going on here?

EDIT:

If I look at the logs in SOLR admin I see the following which must be related...

2/23/2015, 3:06:46 PM
WARN
OverseerCollectionProcessor
OverseerCollectionProcessor.processMessage : reloadcollection ,​ {
2/23/2015, 3:06:46 PM
WARN
ManagedIndexSchemaFactory
The schema has been upgraded to managed,​ but the non-managed schema schema.xml is still loadable. PLEASE REMOVE THIS FILE.
2/23/2015, 3:06:46 PM
WARN
RequestHandlers
Multiple requestHandler registered to the same name: /update/json ignoring: org.apache.solr.handler.UpdateRequestHandler
2/23/2015, 3:06:46 PM
WARN
RequestHandlers
Multiple requestHandler registered to the same name: /update ignoring: org.apache.solr.handler.UpdateRequestHandler
2/23/2015, 3:06:46 PM
WARN
RequestHandlers
Multiple requestHandler registered to the same name: /replication ignoring: org.apache.solr.handler.ReplicationHandler

Solution

  • I eventually figured this out after spending so much time with SOLR over the past few months.

    Let's break down the problem that I was seeing.

    I was uploading a config to zookeeper, creating a collection in solr, and linking the two together. Then I would change the schema - upload it again, reload the solr core - and nothing would happen!

    This was, at its core - user error and a misunderstanding of one main feature.

    I was using a managed schema within SOLR. This means that I could take advantage of the schema API within the newer versions of SOLR. For anyone who is interested - when you use a managed schema - SOLR actually makes a copy of your schema that it edits - and THIS is where the changes go. Not to your original schema (which is still exposed at http://33.33.33.33:8080/solr/test_collection/schema/fields).

    If you want to see that your most recent changes are taking effect. Take a look at the managed-schema file within your config folder in zookeeper.

    Thanks everyone for your help.