Search code examples
c#lucenelucene.net

Meaning of boost 0 in Lucene query


What is the meaning of "^0" in a Lucene query?

From the documentation that: "^" operator is used for boosting a therm in the query. But what happens when you boost a value of 0.

The way it is used in the codebase I have as reference makes me wonder if it is used as an alternative to the operator "+" aka make the therm a must instead of a should, but I cannot find anything confirming this theory.

An example of such a query is:

(SubscriptionId:"5938577c72c848271892f78d"^0)

Solution

  • The boost is a multiplier used when determining the relevance score for the specific term to which it is applied (in your example, that is the subscription ID).

    The effect of setting this multiplier to 0 is to cause the score associated with that term to also be zero. But, the score is still calculated - and therefore relevant documents can still be returned.

    Looking at your specific example again: Because there is only one term in the query, and because that term has a boost of zero, then the overall score of any hits will also be zero, because there is no other "scorable" information in the query. So, you get hits. And they all have a score of 0.

    If you were to use this combined with other terms (where those other terms are not "boosted down" to zero), then its effect would be to still find those documents, but to score them in such a way that the term with the zero boost would not contribute anything to the overall score.

    A term with a boost of zero does not make the term mandatory.

    If you have a query such as this:

    apples oranges^0
    

    And if you have some documents containing only apples, others containing only oranges and some containing both terms, then you will get hits on all such documents.

    Here are some classic parser query results using the above example (and with bananas also):

    Query string: apples oranges^0
    Parsed query: body:apples (body:oranges)^0.0
    
      doc = 0
      score = 0.3648143
      field = apples
    
      doc = 2
      score = 0.2772589
      field = apples oranges
    
      doc = 4
      score = 0.2772589
      field = apples bananas
    
      doc = 1
      score = 0.0
      field = oranges
    
      doc = 5
      score = 0.0
      field = oranges bananas
    

    Notice how documents 1 and 5 have scores of zero - but are still returned to you.

    And notice how document 0 is also returned.

    If you used the following query:

    apples oranges
    

    Then you would get the hits, but in a different order, since now the term oranges is also contributing to the overall relevance scores.

    (Negative boosts are not permitted - they will throw a parse error.)


    You can see an overview of how boosts fit into scoring here: scoring overview. The specific implementation details may vary (and you may be using a completely different custom scorer, of course). But this shows the general approach.