Search code examples
javaapachetomcatsolrsolrj

Does removing a <charFilter> from solr schema.xml require a re-index?


I have a solr 4.3.1 core that already has indexed data with the following configuration for a field in its schema.xml. Here is the portion of my schema.xml related to the field, which is the "text" field.

<fields>
    <field name="text" type="text" indexed="true" stored="true" required="false" />
</fields>
<types>
    <fieldType name="text" class="solr.TextField" positionIncrementGap="100">
        <analyzer>
            <charFilter class="solr.HTMLStripCharFilterFactory" />
            <tokenizer class="solr.StandardTokenizerFactory" />
            <filter class="solr.StandardFilterFactory" />
            <filter class="solr.TrimFilterFactory" />
            <filter class="solr.ICUFoldingFilterFactory" />
        </analyzer>
    </fieldType>
</types>

I need to remove the <charFilter> part. The HTMLStripCharFilterFactory filter has a bug that makes it not usable in this scenario (see https://issues.apache.org/jira/browse/SOLR-2834). The bug makes the solrj client not able to handle the response from an analysis request to solr. It is a bug in all versions of solr 4 as far as I can tell and it doesn't look like it is going to be fixed any time soon. I also don't actually use the HTMLStripCharFilterFactory. It was previously put in place but then never used. As a result I have a blocking, unused feature in my schema that I want to remove.

I have a test environment with a copy of all of the data which I have experimented with. In my test, I stopped the tomcat server that was running solr, removed that <charFilter> line, and restarted tomcat. I did not see any negative impact from the change and now solrj is able to properly handle things and I get the results I am expecting. At this point I feel like I can just make the change to schema.xml and that is all I need to do.

However, when I read pages like http://wiki.apache.org/solr/HowToReindex it makes it sound like I would need to reindex because I'm changing schema.xml.

So in the end, can anyone verify if I would need to re-index or not? What are the risks, if any, to making this change to schema.xml without re-indexing?


Solution

  • It depends what kinds of changes you make to schema.

    If you make changes to schema.xml that apply to existing documents, you would need to reindex for those changes to be applied to your Solr index. You can change schema.xml and not reindex, but then at that point your index can become inconsistent, because new documents that you add to index will have updates.

    In your example above, if you removed charFilter from schema.xml and do not reindex, the old documents already in the index will have charFilter applied to them while new ones getting indexed going forward would not have charFilter, so there will be an inconsistency. So it is recommended to reindex documents if changes apply to existing documents[already in index].

    There are very rare use cases in which reindex is not required.