Search code examples
javasolrsearch-enginedocument

Solr query results - need searched text and a few lines around it


I am completely lost. I think I am definitely missing something fundamental here. Everybody has such awesome stuff to say about Solr but I fail to see it.

I indexed a structured pdf document in Solr. The problem is when I search for a simple string - I get the entire content field as the response! I don't know how to change that. My requirement is that, lets say I search for "metadata" it should give me

"MetadataDiscussion . . . 4 matches ... make sure that Tika users have a chance to get to all of the metadata created and/or extracted by Tika. == Original Problem == The original inspiration for this page was a Tika ... 10.7k - rev: 2 (current) last modified: 2010-08-02 18:09:45 "

But it gives me the whole document!- the entire string that was indexed. It seems like Lucene can only tell me in which field it occurred, not where in the field it occurred

Any help will be greatly appreciated!!


Solution

  • Lucene/Solr is primarily a retrieval engine - it retrieves documents that match a query. So this behavior is desirable and expected. Now as for your requirement, you can use the highlighting feature of Solr to give you exactly that. Suppose your document text is stored in a field named text - then you would pass the following parameters to Solr:

    &hl=true&hl.fl=text&hl.snippets=5&hl.fragsize=200
    

    Look through the other parameters to customize it even further.

    Solr is amazing :)