I have a field having the value ontwikkelingsdoelstellingen:
.
It is indexed in the solr like below
"tcngramm_X3b_nl_title":["ontwikkelingsdoelstellingen:"],
When I search for ontwikkelingsdoelstelling
, it gives me the result.
But when I search for ontwikkelingsdoelstellingen
or ontwikkelingsdoelstellinge
it does not give me the result.
I checked this in the solr admin UI as well using the Query
.
http://example.com/solr/user-owned/select?debugQuery=on&q=tcngramm_X3b_nl_title%3Aontwikkelingsdoelstelling
What is the issue here?
Updated:
I have another field in the index, tcngramm_X3b_nl_rendered_item
which has the value is a long descripton like
In uitvoering van de Duurzame Ontwikkelingsdoelstellingen
This is a part of the value.
If I search in this field tcngramm_X3b_nl_rendered_item
, ontwikkelingsdoelstellingen
, it also gives me no results
Here it works without the last two characters en
And here it does not work with the actual word.
Ok, The issue was in the solr config files.
It was like this:
<fieldType name="text_ngram" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<charFilter class="solr.MappingCharFilterFactory" mapping="accents_und.txt"/>
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords_und.txt"/>
<filter class="solr.WordDelimiterGraphFilterFactory" catenateNumbers="1" generateNumberParts="1" protected="protwords_und.txt" splitOnCaseChange="0" generateWordParts="1" preserveOriginal="1" catenateAll="0" catenateWords="1"/>
<filter class="solr.LengthFilterFactory" min="2" max="100"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
<filter class="solr.NGramFilterFactory" minGramSize="2" maxGramSize="25"/>
</analyzer>
<analyzer type="query">
<charFilter class="solr.MappingCharFilterFactory" mapping="accents_und.txt"/>
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.LengthFilterFactory" min="2" max="100"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
</fieldType>
You see the maxGramSize
is 25 and ontwikkelingsdoelstellingen
has 27 characters. So increasing the value of that attribute fixed my issue.