Search code examples
searchsolr

Solr search is not working properly if the string contains a colon


I have a field having the value ontwikkelingsdoelstellingen:.

It is indexed in the solr like below

"tcngramm_X3b_nl_title":["ontwikkelingsdoelstellingen:"],

When I search for ontwikkelingsdoelstelling, it gives me the result.

But when I search for ontwikkelingsdoelstellingen or ontwikkelingsdoelstellinge it does not give me the result.

I checked this in the solr admin UI as well using the Query. http://example.com/solr/user-owned/select?debugQuery=on&q=tcngramm_X3b_nl_title%3Aontwikkelingsdoelstelling

What is the issue here?

Updated:

I have another field in the index, tcngramm_X3b_nl_rendered_item which has the value is a long descripton like

In uitvoering van de Duurzame Ontwikkelingsdoelstellingen

This is a part of the value.

If I search in this field tcngramm_X3b_nl_rendered_item, ontwikkelingsdoelstellingen, it also gives me no results

Here it works without the last two characters en enter image description here

And here it does not work with the actual word.

enter image description here

This is the field type enter image description here


Solution

  • Ok, The issue was in the solr config files.

    It was like this:

    <fieldType name="text_ngram" class="solr.TextField" positionIncrementGap="100">
      <analyzer type="index">
        <charFilter class="solr.MappingCharFilterFactory" mapping="accents_und.txt"/>
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords_und.txt"/>
        <filter class="solr.WordDelimiterGraphFilterFactory" catenateNumbers="1" generateNumberParts="1" protected="protwords_und.txt" splitOnCaseChange="0" generateWordParts="1" preserveOriginal="1" catenateAll="0" catenateWords="1"/>
        <filter class="solr.LengthFilterFactory" min="2" max="100"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
        <filter class="solr.NGramFilterFactory" minGramSize="2" maxGramSize="25"/>
      </analyzer>
      <analyzer type="query">
        <charFilter class="solr.MappingCharFilterFactory" mapping="accents_und.txt"/>
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.LengthFilterFactory" min="2" max="100"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
      </analyzer>
    </fieldType>
    

    You see the maxGramSize is 25 and ontwikkelingsdoelstellingen has 27 characters. So increasing the value of that attribute fixed my issue.