Search code examples
solrsolrnet

Using Solr, what is the correct way to "add boosts" instead of using the "max" boost


Using the debug query feature and looking at the "explain" section, I realized that the boosts I have been using: https://stackoverflow.com/a/7701758/7096114 use a "max of" comparison based on the result of the query matched against each field. In my system, I have 10 fields which are boosted based on certain values. I then sort the results by the score in descending order, but I thought this score would be based on how many points it was awarded for any fields it matched (total). I didn't realize that the score was set to the maximum score it calculated for any of the boosted fields. If I wanted to prioritize a result which matched all 10 of my fields and would have a total score (e.g., 500) that is higher than the single over a result which matches only 1 of my fields (e.g., 100), I'm not quite sure how I would handle that.

Example explain:

    320.3237 = sum of:
  0.0069028055 = weight(custom_app:test in 7918) [SchemaSimilarity], result of:
    0.0069028055 = score(doc=7918,freq=1.0 = termFreq=1.0
), product of:
      0.006641347 = idf(docFreq=48698, docCount=49022)
      1.0393683 = tfNorm, computed from:
        1.0 = termFreq=1.0
        1.2 = parameter k1
        0.75 = parameter b
        1.1020359 = avgFieldLength
        1.0 = fieldLength
  320.3168 = max of:
    73.23891 = weight(name_autocomplete:james in 7918) [SchemaSimilarity], result of:
      73.23891 = score(doc=7918,freq=1.0 = termFreq=1.0
), product of:
        6.066 = boost
        7.8911004 = idf(docFreq=32, docCount=86884)
        1.5300368 = tfNorm, computed from:
          1.0 = termFreq=1.0
          1.2 = parameter k1
          0.75 = parameter b
          6.527704 = avgFieldLength
          1.0 = fieldLength
    51.871056 = weight(name_partial_match:colin in 7918) [SchemaSimilarity], result of:
      51.871056 = score(doc=7918,freq=1.0 = termFreq=1.0
), product of:
        4.05 = boost
        7.8603234 = idf(docFreq=33, docCount=86843)
        1.6294072 = tfNorm, computed from:
          1.0 = termFreq=1.0
          1.2 = parameter k1
          0.75 = parameter b
          17.933905 = avgFieldLength
          1.0 = fieldLength
    9.736896 = weight(custom_name_phonetic_en:KLN in 7918) [SchemaSimilarity], result of:
      9.736896 = score(doc=7918,freq=1.0 = termFreq=1.0
), product of:
        1.6875 = boost
        5.4820786 = idf(docFreq=361, docCount=86884)
        1.0525228 = tfNorm, computed from:
          1.0 = termFreq=1.0
          1.2 = parameter k1
          0.75 = parameter b
          2.9156578 = avgFieldLength
          2.56 = fieldLength
    61.69854 = weight(custom_display_name_partial_match:colin in 7918) [SchemaSimilarity], result of:
      61.69854 = score(doc=7918,freq=1.0 = termFreq=1.0
), product of:
        5.0625 = boost
        7.532877 = idf(docFreq=46, docCount=86883)
        1.61789 = tfNorm, computed from:
          1.0 = termFreq=1.0
          1.2 = parameter k1
          0.75 = parameter b
          38.531185 = avgFieldLength
          2.56 = fieldLength
    86.66015 = weight(custom_name_autocomplete:colin in 7918) [SchemaSimilarity], result of:
      86.66015 = score(doc=7918,freq=1.0 = termFreq=1.0
), product of:
        7.5825 = boost
        7.6228366 = idf(docFreq=42, docCount=86884)
        1.4993064 = tfNorm, computed from:
          1.0 = termFreq=1.0
          1.2 = parameter k1
          0.75 = parameter b
          13.767955 = avgFieldLength
          2.56 = fieldLength
    9.267912 = weight(name_phonetic_en:KLN in 7918) [SchemaSimilarity], result of:
      9.267912 = score(doc=7918,freq=1.0 = termFreq=1.0
), product of:
        1.35 = boost
        6.1070633 = idf(docFreq=193, docCount=86884)
        1.1241279 = tfNorm, computed from:
          1.0 = termFreq=1.0
          1.2 = parameter k1
          0.75 = parameter b
          1.3697113 = avgFieldLength
          1.0 = fieldLength
    320.3168 = weight(name_lowercase:colin in 7918) [SchemaSimilarity], result of:
      320.3168 = score(doc=7918,freq=1.0 = termFreq=1.0
), product of:
        40.1 = boost
        7.9879503 = idf(docFreq=29, docCount=86884)
        1.0 = tfNorm, computed from:
          1.0 = termFreq=1.0
          1.2 = parameter k1
          0.75 = parameter b
          1.0 = avgFieldLength
          1.0 = fieldLength

Solution

  • If you want to include parts of the other scores - except for the max scoring query - you can use the tie parameter.

    This parameter tells Solr how much of the score of the other fields that also generated hits to include in the finale score. It's usually a low value, such as 0.1.

    The tie parameter specifies a float value (which should be something much less than 1) to use as tiebreaker in DisMax queries.

    When a term from the user’s input is tested against multiple fields, more than one field may match. If so, each field will generate a different score based on how common that word is in that field (for each document relative to all other documents). The tie parameter lets you control how much the final score of the query will be influenced by the scores of the lower scoring fields compared to the highest scoring field.

    A value of "0.0" - the default - makes the query a pure "disjunction max query": that is, only the maximum scoring subquery contributes to the final score. A value of "1.0" makes the query a pure "disjunction sum query" where it doesn’t matter what the maximum scoring sub query is, because the final score will be the sum of the subquery scores. Typically a low value, such as 0.1, is useful.