Search code examples
vespa

Recall returns nothing when querying rank-profile


I have a sample Vespa instance and I want to train a lightgbm model from the rank-profile. https://docs.vespa.ai/documentation/learning-to-rank.html

However, anytime I specify the recall with the docID, I get 0 hits. I'm using example code from here: https://github.com/vespa-engine/sample-apps/blob/master/text-search/src/python/collect_training_data.py

body = create_request_top_hits("test", "training", hits=2)
get_features(url, body)

And this correctly returns:

[{'id': 'index:domains/0/944f3a850511f388fe97ac85',
  'relevance': 1.2427330381582673,
  'source': 'domains',
  'fields': {'uri': '6202597992',
   'rankfeatures': {'bm25(body)': 2.8145480372957787,
    'nativeFieldMatch(categories)': 0.0,
    'nativeFieldMatch(concepts)': 0.8591903630989031,
    'nativeFieldMatch(links)': 0.0,
    'nativeFieldMatch(title)': 0.0,
    'nativeProximity(categories)': 0.0,
    'nativeProximity(concepts)': 0.0,
    'nativeProximity(links)': 0.0,
    'nativeProximity(title)': 0.0,
    'rankingExpression(time_ranking)': 1.0}}},
 {'id': 'index:domains/0/93f92aae1d6a010c2111e9b7',
  'relevance': 1.2010786365413106,
  'source': 'domains',
  'fields': {'uri': '6206270866',
   'rankfeatures': {'bm25(body)': 2.0397289658724347,
    'nativeFieldMatch(categories)': 0.0,
    'nativeFieldMatch(concepts)': 0.8591903630989031,
    'nativeFieldMatch(links)': 0.0,
    'nativeFieldMatch(title)': 0.0,
    'nativeProximity(categories)': 0.0,
    'nativeProximity(concepts)': 0.0,
    'nativeProximity(links)': 0.0,
    'nativeProximity(title)': 0.0,
    'rankingExpression(time_ranking)': 1.0}}}]

To see if recall works, we'll use the top result:

'id': 'index:domains/0/944f3a850511f388fe97ac85'
'uri': '6202597992'  # docIDs are derived from the uri field

And set the recall to the docid:

doc_id = [6202597992, "6202597992", "944f3a850511f388fe97ac85"]  # multiple representations...
body = create_request_specific_ids("test", "training", doc_id)
get_features(url, body)

I would expect this to return the rank features from before but instead I get 0 hits. This is the full return:

{'root': {'id': 'toplevel', 'relevance': 1.0, 'fields': {'totalCount': 0}, 'coverage': {'coverage': 100, 'documents': 798, 'full': True, 'nodes': 5, 'results': 5, 'resultsFull': 5}}}

I've checked docs and examples and I haven't been able to find any information here. Any insights would be greatly appreciated.


Solution

  • The collect script/function expects that there is a field called id in your document schema. If you alter the script to use the uri field instead you should be able to retrieve the documents.