Search code examples
solrsolr4solrcloud

Solr : stemming in a live cluster (reindexing issues)


I have a live Solr cluster where stemming was not enabled and my schema.xml looks like this:

..
<field name="Searchable_Text" type="text_general" indexed="true" stored="true" multiValued="false"/> 
..
<field name="text" type="text_general" indexed="true" stored="false" multiValued="true"/>
..
<copyField source="Searchable_Text" dest="text" maxChars="3000"/>
..
<fieldType name="text_general" class="solr.TextField" positionIncrementGap="100">
    <analyzer type="index">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_en.txt"/>
        <filter class="solr.LowerCaseFilterFactory"/>
    </analyzer>
    <analyzer type="query">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
        <filter class="solr.LowerCaseFilterFactory"/>
    </analyzer>
</fieldType>
..

and these are the steps I took to enable stemming on my live cluster

Changed the schema.xml to including stemming in the index:

<fieldType name="text_general" class="solr.TextField" positionIncrementGap="100">
    <analyzer type="index">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_en.txt"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.PorterStemFilterFactory"/>
    </analyzer>
    <analyzer type="query">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
        <filter class="solr.LowerCaseFilterFactory"/>
    </analyzer>
</fieldType>

and I disabled the opensearcher in solrconfig :

<autoCommit> 
   <maxTime>${solr.autoCommit.maxTime:60000}</maxTime> 
   <openSearcher>false</openSearcher>  <!-- was set to true earlier-->
</autoCommit>

I then reindexed my entire data. My assumption is that the data is committed but since the opensearcher is set to false, the newly indexed data is not visible.

After this, I changed the schema.xml to include stemming in the query and changed solrconfig.xml to set opensearcher as true :

<fieldType name="text_general" class="solr.TextField" positionIncrementGap="100">
    <analyzer type="index">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_en.txt"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.PorterStemFilterFactory"/>
    </analyzer>
    <analyzer type="query">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.PorterStemFilterFactory"/>
    </analyzer>
</fieldType>

and

<autoCommit> 
   <maxTime>${solr.autoCommit.maxTime:60000}</maxTime> 
   <openSearcher>true</openSearcher>  <!-- was set to true earlier-->
</autoCommit>

I then reloaded the core. But I still dont see my queries stemmed. A debugQuery check doesnt seem to show stemming in the query. Its quite weird. Is there anything wrong with my approach ?

I am using Solr 4.7

Thanks


Solution

  • Well, there was something stupid which I landed up doing because of which things didn't work. The above steps surely worked for me, except for that when I reloaded the core, I did it using the LB VIP and not each individual machine (!) . Doing that solved my problem.

    Anyways, thanks everybody !