Search code examples
lucenescoring

How to define a boost factor to each term in each document during indexing?


I want to insert another score factor in Lucene's similarity equation. The problem is that I can't just override Similarity class, as it is unaware of the document and terms it is computing scores.

For example, in a document with the text below:

The cat is in the top of the tree, and he is going to stay there.

I have an algorithm of my own, that assigns for each one the terms in this document a score regarding how much each one of them are important to the document as whole. A possible score for each word is:

cat: 0.789212
tree: 0.633423
top: 0.412315
stay: 0.123912
there: 0.0999842
going: 0.00988412
...

The score for each word is different from document to document. For example, in another document cat could have score: 0.0023912

I want to add this score to the Lucene's scoring, but I'm kind of lost on how to do that.

Any tips?


Solution

  • Use Lucene's Payload feature:

    From: http://www.lucidimagination.com/blog/2009/08/05/getting-started-with-payloads/

    1. Add a Payload to one or more Tokens during indexing.
    2. Override the Similarity class to handle scoring payloads
    3. Use a Payload aware Query during your search