I try to use the highlighting function of SOLR 4.4. After some trying it finally works, but not as I expected:
Generell setting: I have a text and a title field. Both are indexed and searched, but the highlighting is only needed in the title field.
string
=> no highlighting-results even though the field was storedtext_ws
(only WhiteSpace-Tokenizer
). I was not quite sure whether or not I had to index
the title field, so I did it. => HL working but only for direct matches (q=Apple didn't HL Apple-Pie in the title - q=Apple-Pie did)ngram
to the title field. Now the q:apple gives a hit, but highlights the complete Apple-pie, not only the query term.Now for the question: is that the expected behaviour or is there a way to only highlight the query term
EDIT: snippets for..
.. solrconfig.xml
<requestHandler name="/query" class="solr.SearchHandler">
<lst name="defaults">
<str name="echoParams">explicit</str>
<str name="wt">json</str>
<str name="indent">true</str>
<str name="df">text</str>
<str name="defType">edismax</str>
<str name="qf">title, text</str>
<str name="hl">true</str>
<str name="hl.fl">title</str>
<str name="hl.simple.pre"><b class="text-success"></str>
<str name="hl.simple.post"></b></str>
</lst>
</requestHandler>
.. schema.xml
<fieldType name="text" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<charFilter class="solr.HTMLStripCharFilterFactory" />
<tokenizer class="solr.WhitespaceTokenizerFactory" />
<filter class="solr.WordDelimiterFilterFactory" stemEnglishPossessive="0" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="0" preserveOriginal="1" />
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />
<filter class="solr.LowerCaseFilterFactory" />
<filter class="solr.SnowballPorterFilterFactory" language="German" />
<filter class="solr.RemoveDuplicatesTokenFilterFactory" />
</analyzer>
<analyzer type="query">
<tokenizer class="solr.WhitespaceTokenizerFactory" />
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="0" preserveOriginal="1" />
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />
<filter class="solr.LowerCaseFilterFactory" />
<filter class="solr.SnowballPorterFilterFactory" language="German" />
<filter class="solr.RemoveDuplicatesTokenFilterFactory" />
</analyzer>
</fieldType>
<fieldType name="text_ngrammed" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<charFilter class="solr.HTMLStripCharFilterFactory" />
<tokenizer class="solr.WhitespaceTokenizerFactory" />
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />
<filter class="solr.LowerCaseFilterFactory" />
<filter class="solr.NGramFilterFactory" minGramSize="3" maxGramSize="10" />
<filter class="solr.RemoveDuplicatesTokenFilterFactory" />
<!-- <filter class="solr.SnowballPorterFilterFactory" language="English" protected="protwords.txt"/> -->
</analyzer>
<analyzer type="query">
<tokenizer class="solr.WhitespaceTokenizerFactory" />
<filter class="solr.LowerCaseFilterFactory" />
</analyzer>
</fieldType>
<fields>
<!-- IDs -->
<field name="id" type="string" indexed="true" stored="true" required="true" />
<!-- Content -->
<field name="title" type="text_ngrammed" indexed="true" stored="true" termVectors="true" termPositions="true" termOffsets="true" />
<field name="text" type="text" indexed="true" stored="false" multiValued="true" />
</fields>
I changed the tokenizer from WhiteSpaceTokenizerFactory
to NGramTokenizerFactory
and removed the NGramFilterFactory
-> now it's (almost) working as expected