Search code examples
solr

How can I rank results lower in SOLR if two fields match at the same time?


I have records with a "title" and a "brand" fields and i query both fields.

Sometimes a record has the brand in the title, which will result in higher scores, but I want to score them the same.

How can i rate records lower were both fields match?


Solution

  • Your solution is not ideal.

    In Solr, there is the Dismax query parser that allows you to search for individual terms across several fields, using some other parameters to influence the final score.

    The q parameter defines the main query while the qf parameter can be used to specify a list of fields with which to search. In addition, the tie parameter lets you control how much the final score of the query will be influenced by the scores of the lower-scoring fields compared to the highest-scoring field.

    Let's make a simple example.

    Using the standard query parser this is what you will obtain running this query (q=adidas):

    http://localhost:8983/solr/indexName/select?q=title:adidas%20OR%20brand:adidas&fl=id,title,brand,score
    
    "docs": [
          {
            "id": "2",
            "title": "Shoes Adidas",
            "brand": "Adidas",
            "score": 0.9623127
          },
          {
            "id": "1",
            "title": "Shoes",
            "brand": "Adidas",
            "score": 0.31506687
          },
          {
            "id": "6",
            "title": "Shirt",
            "brand": "Adidas",
            "score": 0.31506687
          }
        ]
    

    The doc with id 2 has a higher score than the others because the score is the sum of two clauses ('adidas' in title + 'adidas' in brand).

    If you perform a Dismax query with tie=0 (a pure "disjunction max query"):

    http://localhost:8983/solr/indexName/select?defType=dismax&q=adidas&qf=brand%20title&fl=id,title,brand,score&tie=0
    

    You will obtain:

    "docs": [
          {
            "id": "2",
            "title": "Shoes Adidas",
            "brand": "Adidas",
            "score": 0.6472458
          },
          {
            "id": "1",
            "title": "Shoes",
            "brand": "Adidas",
            "score": 0.31506687
          },
          {
            "id": "6",
            "title": "Shirt",
            "brand": "Adidas",
            "score": 0.31506687
          }
        ]
    

    The doc with id 2 has a lower score than before because only the maximum scoring subquery contributes to the final score, i.e. it takes the max score between 0.6472458 and 0.31506687 without summing them (0.9623127).

    With the qf parameter, it is also possible to assign a boost factor to increase or decrease the importance of a particular field in the query, for example:

    &qf=brand^3 title
    

    It makes matches in brand much more significant than matches in title.

    In any case, boosting should be used with caution because it may lead to unexpected results. Every decision with boosting should be supported by an online and offline search relevance evaluation.

    Can this help you?