I am trying to implement PorterStemFilterFactory in my analyzer during indexing .But when i query for documents,the output dont have documents which I got before adding the above analyzer.How can I get documents with both stemming and normal filters.
schema:
<fieldType name="text_general" class="solr.TextField" positionIncrementGap="100" multiValued="true">
<analyzer type="index">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<charFilter class="solr.PatternReplaceCharFilterFactory" pattern="[^a-zA-Z0-9]" replacement=" "/>
<filter class="solr.StopFilterFactory" words="stopwords.txt" ignoreCase="true"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.PorterStemFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StopFilterFactory" words="stopwords.txt" ignoreCase="true"/>
<filter class="solr.SynonymFilterFactory" expand="true" ignoreCase="true" synonyms="synonyms.txt"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
when I search for query "agile" with below analyzer,it returned documents where the query were found.
<fieldType name="text_general" class="solr.TextField" positionIncrementGap="100" multiValued="true">
<analyzer type="index">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<charFilter class="solr.PatternReplaceCharFilterFactory" pattern="[^a-zA-Z0-9]" replacement=" "/>
<filter class="solr.StopFilterFactory" words="stopwords.txt" ignoreCase="true"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StopFilterFactory" words="stopwords.txt" ignoreCase="true"/>
<filter class="solr.SynonymFilterFactory" expand="true" ignoreCase="true" synonyms="synonyms.txt"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
Thanks in Advance
So the PorterStemFilterFactory
removes common endings from words.
In your case the word agile
is reduced to agil
.
You can check here https://tartarus.org/martin/PorterStemmer/voc.txt. (search here for the word agile).
Now search here for the corresponding output after applying Porter Stemming. https://tartarus.org/martin/PorterStemmer/output.txt
You will see you cant find the word agile
, because it is stemmed to agil
.
That is why you are not able to search for agile
, since there is no document that exists with that word . try searching for agil
and you should see the results.