Search code examples
solrhighlightingkinosearch

Google-like Fragmenter for Solr?


I am implementing a Solr application that had originally used KinoSearch.

I have everything now moved to Solr and a results page, but I notice a difference in the results. Specifically, the highlighting is not quite the same.

With KinoSearch, there is the KinoSearch::Highlight::Highlighter object which appears to produce fragments similar to Google (tries to break around sentences and adds elipsis (...) separated by a space if breaks mid-sentence).

Does anybody have any suggestions for a way to implement something similar with Solr. I have tried the regex fragmenter to break at sentences, but it seems to actually apply the regular expression in reverse and starts fragments with a period from the previous sentence.

I can add the elipsis logic in the view code. I'm just wondering if anybody has encountered something similar and how it has been handled.

Thanks!


Solution

  • My question had two parts. The first issue regarding the search seeming to not follow the regular expression and put a period before everything is addressed here: http://lucene.472066.n3.nabble.com/Basic-sentence-parsing-with-the-regex-highlighter-fragmenter-td505749.html

    The second issue of the elipsis, I am going to implement in the front-end code.

    I will leave this question open as I'm still curious if a better solution exists.