Search code examples
vespa

Using weakAnd with an array<string> attribute


I have a document that has the following two fields:

    field label type string {
        indexing: summary | index | attribute
        index: enable-bm25
        attribute: fast-search
    }

    field titles type array<string> { 
        indexing: index | summary | attribute
        index: enable-bm25
        attribute: fast-search
    }

and the ranking profile i'm using:

    rank-profile profile {
        first-phase {
            expression: max(textSimilarity(label),elementSimilarity(titles))
        }
    }

I want to get a document where either its label or one of the values in its titles contain the words "Hot" and "Springs".

The top document should be this: (as its label contains exactly the two words - "Hot" and "Springs")

{
   "label": "Hot Springs"
   "titles": []
}

I'm running a query like so:

select * from DOC_NAME where weakAnd (label contains "Hot", label contains "Springs", titles contains "Hot", titles contains "Springs")

but this document doesn't even appear in the top 10 results.

If I run the query with OR instead of weakAnd it works as expected:

select * from DOC_NAME where label contains "Hot" OR label contains "Springs" OR titles contains "Hot" OR titles contains "Springs"

this will return the document above as the TOP result.

Even if I have the query like this:

select * from DOC_NAME where weakAnd (label contains "Hot", label contains "Springs") OR weakAnd( titles contains "Hot", titles contains "Springs")

I still get the doc above as the top result.

Why wouldn't weakAnd produce this document as a top result?


Solution

  • From the weakAnd documentation

    The reason for increasing the target number is that weakAnd uses a ranking function internally (inner product) and the hits which are evaluated by the weakAnd scorer is also evaluated by the first-phase ranking expression. Anything similar to classic vector ranking should correlate well with weakAnd inner product scoring, e.g. nativeFieldMatch or bm25 ranking features.Note that because weakAnd relies on feedback identifying which hits are used for first phase ranking to increase its threshold for what's considered a good hit, the special unranked rank profile (which turns off ranking completely) may cause weakAnd queries to become slower than using a real rank profile.

    In your case, this fails because you have a max expression in the first-phase configurable ranking. You probably have better success using a different ranking expression in first phase (e.g nativeRank) and your max in a second-phase expression. Or use OR, which does not attempt to be smart.

    This is unrelated, but your field definitions don't make much sense. If you need free-text matching capabilities, consider this instead.

    field label type string {
            indexing: summary | index 
            index: enable-bm25
    }```