Search code examples
solrquery-parser

Solr : Search with special character with Phrase search


We need to make Solr Search like

"Success & Failure"
"Working 50%"

but Solr query parser eliminates all special characters from search, although if i add escape sequence to it.

my search query is as mentioned below

http://localhost:8080/solr/core0/select?q=%22Success%20\%26%20Failure%22&debugQuery=on

below is debugQuery for it.

<lst name="debug">
   <str name="rawquerystring">"Success & Failure"</str>
   <str name="querystring">Success & Failure"</str>
   <str name="parsedquery">PhraseQuery(text:"success failure")</str>
   <str name="parsedquery_toString">text:"success failure"</str>
   <lst name="explain"/>
    <str name="QParser">LuceneQParser</str>
    <lst name="timing"></lst>
 </lst>

I have searched for this over web, I got results that says that special characters should be indexed to make it work, as solr by default do not index any special characters.

To do so, i have added solr.WordDelimiterFilterFactory to my TextField Definition

<fieldType name="text_general" class="solr.TextField" positionIncrementGap="100">
      <analyzer type="index">
     <!--   <charFilter class="solr.MappingCharFilterFactory" mapping="char-mapping.txt"/>-->
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.WordDelimiterFilterFactory"
                splitOnCaseChange="0"
                splitOnNumerics="0"
                stemEnglishPossessive="0"
                generateWordParts="0"
                generateNumberParts="0"
                catenateWords="0"
                catenateNumbers="0"
                catenateAll="0"
                preserveOriginal="1"
                types="wdfftypes.txt"
                />
       <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="false" />
        <filter class="solr.LowerCaseFilterFactory"/>
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.WordDelimiterFilterFactory"
                splitOnCaseChange="0"
                splitOnNumerics="0"
                stemEnglishPossessive="0"
                generateWordParts="0"
                generateNumberParts="0"
                catenateWords="0"
                catenateNumbers="0"
                catenateAll="0"
                preserveOriginal="1"
                types="wdfftypes.txt"
                />
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="false" />
        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
        <filter class="solr.LowerCaseFilterFactory"/>
      </analyzer>
    </fieldType>

althohg doing so, it do not parse & in search term.

We want to make, solr should search with "success & failure" , and should not eliminate special character.

anybody have any idea, how to do this??


Solution

  • You should consider using the solr.WhitespaceTokenizerFactory instead of the solr.StandardTokenizerFactory as the StandardTokenizer consumes special characters as word boundaries. You need to start thinking when you want text split into words in this case.

    Additionally the WordDelimiterFilterFactory you are using may filter this character away. To prevent it from doing so you should be able to define & as ALPHA in your type definition according to this question "How do I find documents containing digits and dollar signs in Solr?".

    That definition is what is given in the file denoted by types="wdfftypes.txt" in the declaration of your solr.WordDelimiterFilterFactory.

    & => ALPHA

    Further reading on how this file needs to be made up