Search code examples
lucenelucene.nethighlighting

Extract terms from query for highlighting


I'm extracting terms from the query calling ExtractTerms() on the Query object that I get as the result of QueryParser.Parse(). I get a HashTable, but each item present as:

Key - term:term
Value - term:term
  1. Why are the key and the value the same? And more why is term value duplicated and separated by colon?
  2. Do highlighters only insert tags or to do anything else? I want not only to get text fragments but to highlight the source text (it's big enough). I try to get terms and by offsets to insert tags by hand. But I worry if this is the right solution.

Solution

    1. It is because .Net 2.0 doesnt have an equivalent to java's HashSet. The conversion to .Net uses Hashtables with the same value in key/value. The colon you see is just the result of Term.ToString(), a Term is a fieldname + the term text, your field name is probably "term".

    2. To highlight an entire document using the Highlighter contrib, use the NullFragmenter