Search code examples
spring-bootelasticsearchopensearch

OpenSearch with Spring Boot - Displaying docuemnts by number of keywors


I want to get OpenSearch results sorted (in descending order) by the number of keywords in the document. _score isn't what I am looking for. This one uses the BM25 algorithm, which is more like ranking with NLP techniques.

Example: I am searching for 2 phrases, 'happy' and 'cat'

What I have: I am getting documents sorted by _score (which is not what I want - as the long text with 5 keywords is ranked lower than a short document with 2 keywords)

What I want: I want the long document with 5 keywords to be at the top and the document with 2 keywords to be below.

My solution now: I have a Java code solution for it, but that creates a bottleneck for the API. I am basically counting keywords words and then sorting documents by the number of keywords. That takes 2 parallel streams. I'm still blocking API for way too long. I am searching for a 'pure' OpenSearch solution.


Solution

  • I found the solution myself. I had to change the standard Okapi BM25 algorithm setting to one below. This has to be done when creating index.

    BM25 params:

    • k1 - is a tuning parameter (usually set between 1.2 and 2.0) it controls the term frequency saturation. Increasing k1 increases the saturation effect.

    • b - is another tuning parameter (usually set around 0.75) – it controls the length normalization. A value of 1.0 fully normalizes document length, while 0.0 ignores length normalization.

       {
       "settings": {
       "index": {
         "similarity": {
           "default": {
             "type": "BM25",
             "k1": 2.0,
             "b": 0.0
           }
         }
       }
      

      }