Search code examples
elasticsearchrelevance

ElasticSearch boosting relevance based on the count of the field value


I'm trying to boost the relevance based on the count of the field value. The less count of the field value, the more relevant.

For example, I have 1001 documents. 1000 documents are written by John, and only one is written by Joe.

// 1000 documents by John
{"title": "abc 1", "author": "John"}
{"title": "abc 2", "author": "John"}
// ...
{"title": "abc 1000", "author": "John"}

// 1 document by Joe
{"title": "abc 1", "author": "Joe"}

I'll get 1001 documents when I search "abc" against title field. These documents should have pretty similar relevance score if they are not exact same. The count of field value "John" is 1000 and the count of field value "Joe" is 1. Now, I'd like to boost the relevance of the document {"title": "abc 1", "author": "Joe"}, otherwise, it would be really hard to see the document with the author Joe.

Thank you!


Solution

  • In case someone runs into the same use case, I'll explain my workaround by using Function Score Query. This way would make at least two calls to Elasticsearch server.

    1. Get the counts for each person(You may use aggregation feature). In our example, we get 1000 from John and 1 from Joe.
    2. Generate the weight from the counts. The more counts, the less relevance weight. Something like 1 + sqrt(1/1000) for John and 1 + sqrt(1/1) for Joe.
    3. Use the weight in the script to calculate the score according to the author value(The script can be much better):

      {
      "query": {
          "function_score": {
              "query": {
                  "match": { "title": "abc" }
              },
              "script_score" : {
                  "script" : {
                    "inline": "if (doc['author'].value == 'John') {return (1 + sqrt(1/1000)) * _score}\n return (1 + sqrt(1/1)) * _score;"
                  }
              }
          }
      }
      }