Search code examples
solrwildcardstemming

Solr - Wild Card Search varies with Stemming Methods


I have 2 versions of solr working in my machine . say SolrVer1 and SolrVer2

SolrVer1 have applied , below stemming methods on field type text_en_splitting

<filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt" ignoreCase="true"/>
 <filter class="solr.PorterStemFilterFactory" ignoreCase="true"/>

SolrVer2 have applied , below stemming methods on field type text_en_splitting

<filter class="solr.SnowballPorterFilterFactory" language="English" protected="protwords.txt"/>

it works almost same for regular search , but while using wild card search then wild card search does not giving results with grammatical on SolrVer1

like searching with ray* , SolrVer1 returns very less data as compared to SolrVer2. when i observed the results then i found that SolrVer1 does not return data with only ray and rays.

I don't know where i should use SnowballPorterFilterFactory and where i should use PorterStemFilterFactory . and what are the pros and cons of them?

Can anybody have idea on this behavior ??

Thanks


Solution

  • Need to know what the stemmers output for ray, rays.

    Try stemming them at the Porter stemmer online tool: http://qaa.ath.cx/porter_js_demo.html. It outputs rai! That's the reason you don't get any matches for ray* with Porter stemmer.

    And here is a tool for snowball stemmer: http://snowball.tartarus.org/demo.php. This outputs ray for ray and rays which is why you get the results.

    You may want to read this for comparing the two stemmers: http://snowball.tartarus.org/texts/introduction.html

    Appears like snowball was designed to address such short-comings of Porter.