Search code examples
vespa

Rank by maximum bm25 score on a field of type array<string>


I have a schema that has a field of type array<string>:

field titles type array<string> { 
        indexing: index | summary | attribute
        index: enable-bm25
        attribute: fast-search
}

Say titles contains N titles - Title 1, Title 2, ..., Title N. I would like to rank documents based on the max bm25 between one of the titles in titles and the query. In other words I would like the rank of the document to be equal to max(bm25('Title 1'),bm25('Title 2'),...,bm25('Title N'))

Just setting the ranking expression to bm25(titles) does not achieve what I want. For e.g. given a query Q with terms: term 1, term 2, term 3 and two documents:

  • doc 1: {"titles": [".\*term 1.\*", ".\*term 1.\*", ".\*term 1.\*", ".\*term 1.\*"]}
  • doc 2: {"titles": [".\*term 1 term 2 term 3.\*", STRING_WITH_NONE_OF_THE_TERMS, STRING_WITH_NONE_OF_THE_TERMS, STRING_WITH_NONE_OF_THE_TERMS, STRING_WITH_NONE_OF_THE_TERMS]

Having the bm25(titles) ranking expression ranks doc 1 higher than doc 2. I assume it's because a term from the query is in all titles, while in the second doc a term from the query is only in one title. I want doc 2 to be ranked higher as it contains a title that is an almost complete match for the query, so max(bm25) should be higher for doc 2 but average/sum over all docs might be higher for doc 1

Is there a way I can achieve that in Vespa?


Solution

  • Thanks for the detailed question. Vespa does not support this for the bm25 rank feature. It is computed over all elements.

    You can achieve similar functionality using rank-features designed for multi-valued fields. See https://docs.vespa.ai/en/searching-multi-valued-fields.html, https://docs.vespa.ai/en/reference/rank-features.html#features-for-indexed-multivalue-string-fields.

    Unrelated: Note that unless you want to group on this field, you don't want to use attribute as it puts everything in memory.

    field titles type array<string> { 
            indexing: index | summary 
            index: enable-bm25
    }