Search code examples
solrfull-text-searchsolrnet

Return all search hits in the highlighted list in solr using solrnet


We are trying to execute a solr based search on the content of text files and the requirement is trying to return all the hits of the search term in each document along with the highlighted text around the hit.

We are able to return the number of documents found along with the highlighted snippet around the first hit of the search term in the document. But is does not return the list of highlights across the document where the search term is found. We can get the TermFrequency reported as the correct number but not the snippets around all these occurrences.

Relevant portion of the solr schema:

<field name="Content" type="text_general" indexed="false" stored="true" required="true"/>
<field name="ContentSearch" type="text_general" indexed="true" stored="false" multiValued="true"/>

<copyField source="Content" dest="ContentSearch"/>

For example, if we have a.txt and b.pdf which are indexed, and the search term "case" exists in both the documents multiple times(a.txt - 7 hits, b.pdf - 10 hits), when executing a search for "case" against both the documents, we are getting two documents returned with the correct term frequencies(7 and 9) but the highlight list contains only one record which corresponds to the first hit in the files.

Is this something to do with using TermVectorComponent for the content field. I have read but could not quite make out the way the TVC works and in which situation it is helpful.


Solution

  • This is due to the default settings for Highlighting. In order to achieve what you want, I would recommend changing the snippets and maxAnalyzedChars options. By default the snippets is set to only return one snippet and maxAnalyzedChars will only look at the first 51200 characters. I would set these values to snippets=20 (or some value larger than the expected max number of snippets) and maxAnalyzedChars=100000 (or some other value larger than the longest field value) this will ensure that the entire value is analyzed and that all highlights are returned.

    Note: You may also need to work with the fragsize setting to get the appropriate size for the snippets (to include the line before and after the highlighted word). As the default size for the fragments is 100 characters.

    Within SolrNet you would need to set the Snippets and MaxAnalyzedChars properties on the HighlightingParameters you are passing to your query. Like something similar to the following:

       var results = solr.Query(new SolrQueryByField("ContentSearch", "case"), 
         new QueryOptions {
           Highlight = new HighlightingParameters {
              Fields = new[] {"ContentSearch"},
              Snippets = 20,
              MaxAnalyzedChars = 100000,
          }
       });