Search code examples
yqlvespa

Batch check document existence in Vespa


I have a list of docid and want to check if they exist in Vespa. If so, return a specific field of that docid. Currently, I'm doing this sequentially. Sample code in Python:

import requests
doc_urlbase = 'http://localhost:8080/document/v1/test/test'
docid_list = [1,2,3,4,5]
for docid in docid_list:
    doc_url = '{}/{}'.format(doc_urlbase, i)
    req = requests.get(doc_url)
    if req.status_code == 200:
        # docid is in Vespa, save the field value
    else:
        # display not found

I'm hoping there's a better way to do so, and return an array/map as result. Something like:

Query given:
    docid_list = [1,2,3,4,5]

Return:
    {
        1: "field value",
        2: "field value",
        3: "",             # not in Vespa
        4: "field value",
        5: "field value",
    }

Thanks!


Solution

  • If your list is large relative to corpus you can use vespa-visit to quickly dump all ids and then match the sets

    I assume that is not the case. If you do this frequently, you can create a Component like Searcher or Handler that you POST the id list to. In the Component, use Java Document API to Get each ID, and create a Hit for each match. Each such Get will be in ms range, so will be quicker - the tradeoff you will have to write some code.

    You can also run the same code from a standalone Java program.