Search code examples
javalucenesimilaritysolr-boost

How is lucene boosting affected by lengthNorm similarity


I have two docs containing:

doc_1: one two three four five Bingo

doc_2: Bingo one two three four five

Which I am indexing in two fields each, where one field contains the first 5 terms, and the second contains the last term.

TextField start_field = new TextField("start_words", content.substring(0, index), Field.Store.NO);
TextField end_field = new TextField("end_words", content.substring(index,content.length()-1, Field.Store.NO);
// index is index value of 5th ' '

In order to see boosting results better, I have implemented the following similarity:

DefaultSimilarity customSimilarity = new DefaultSimilarity() {
     @Override
     public float lengthNorm(FieldInvertState state) {
         return 1; // So length of each field would not matter
     }
};

Without applying any boost, searching for Bingo results in both documents having the same score (as expected and intended). However, when applying a boost to one of the fields (start_field.setBoost(5)), both scores remain identical, although doc_2's field containing Bingo was boosted.

If I remove the customSimilarity, boosting works as expected.

Why is boosting stopped by lengthNorm and how can I make the boosting work with the given overwritten Similarity?


Solution

  • The default implementation of lengthNorm() in DefaultSimilarity is state.getBoost() * lengthNorm(numTerms).

    In your implementation, you are not factoring in the boost. To make your boosts matter, you could just have your implementation return state.getBoost().