Search code examples
vespa

How to perform a full-text search in Vespa?


I am trying to do a full-text search on a field of some documents, and I was looking for your advices on how to do so. I first tried to do this type of request:

GET http://localhost:8080/search/?query=lord+of+the+rings

But it was returning me the documents where the field was an exact match and contained no other information than the given string , so I tried the equivalent in YQL:

GET http://localhost:8080/search/?yql=SELECT * FROM site WHERE text CONTAINS "lord of the rings";

And I had the exact same results. But when further reading the documentation I fell upon the MATCHES instruction, and it indeed gives me the results I'm seem to be looking for, by doing this kind of request:

GET http://localhost:8080/search/?yql=SELECT * FROM site WHERE text MATCHES "lord of the rings";

Though I don't know why, for some requests of this type I encountered a timeout error of this type:

{
    "root": {
        "id": "toplevel",
        "relevance": 1,
        "fields": {
            "totalCount": 0
        },
        "errors": [
            {
                "code": 12,
                "summary": "Timed out",
                "source": "site",
                "message": "Timeout while waiting for sc0.num0"
            }
        ]
    }
}

So I solved this issue by adding greater than default timeout value:

GET http://localhost:8080/search/?yql=SELECT * FROM site WHERE text MATCHES "lord of the rings";&timeout=20000

My question is, am I doing full-text search the right way, and how could I improve it ?

EDIT: Here is the corresponding search definition:

search site {

    document site {

        field text type string {
            stemming: none
            normalizing: none
            indexing: attribute
        }

        field title type string {
            stemming: none
            normalizing: none
            indexing: attribute
        }
    }

    fieldset default {
        fields: title, text
    }

    rank-profile post inherits default {
        rank-type text: about
        rank-type title: about
        first-phase {
            expression: nativeRank(title, text)
        }
   }
}

Solution

  • What does your search definition file look like? I suspect you have put your text content in an "attribute" field, which defaults to "word match" semantics. You probably want "text match" semantics which means you'll need to put your content in an "index" type field.

    https://docs.vespa.ai/documentation/reference/search-definitions-reference.html#match

    The "MATCHES" operator you are using interprets your input as a regular expression, which is powerful, but slow as it applies the regular expression on all attributes (further optimizations to something like https://swtch.com/~rsc/regexp/regexp4.html are possible but not currently implemented).