Search code examples
bigdatavespa

Indexing in Vespa is slow


When indexing in local Vespa, the indexing is slow.

My configuration: `

<container id="default" version="1.0">
    <search />
    <document-api />
    <nodes>
        <node hostalias="node1" />
    </nodes>
</container>

<content id="bo" version="1.0">
    <redundancy>1</redundancy>
    <documents>
        <document type="psearch" mode="index" />
    </documents>
    <nodes>
        <node hostalias="node1" distribution-key="0" />
    </nodes>
</content>

`

and schema:

schema psearch {
    document psearch {
        field Id type int {
            indexing: summary | attribute
            attribute: fast-search
        }
        field Name type string {
            indexing: summary | index | attribute
            index: enable-bm25
    }
    field AdId type string {
            indexing: summary | index | attribute
            index: enable-bm25
    }
    field Country type string {
            indexing: summary | index | attribute
            index: enable-bm25
    }
    field Avatar type string {
            indexing: summary | index | attribute
            index: enable-bm25
    }
    field Value type long {
            indexing: summary | attribute
            attribute: fast-search
        }
        field Numbers type int {
            indexing: summary | attribute
            attribute: fast-search
        }
    field BotLastTime type long {
            indexing: summary | attribute
            attribute: fast-search
        }
    field BotDailyCount type int {
            indexing: summary | attribute
            attribute: fast-search
        }
    field Platform type string {
            indexing: summary | index | attribute
            index: enable-bm25
      }
   }

    fieldset default {
        fields: Id, Name, AdId, Country, Avatar, Numbers, BotLastTime, BotDailyCount, Platform
    }

    rank-profile default {
        first-phase {
            expression: nativeRank(Id, Name, AdId, Country, Avatar, Numbers, BotLastTime, BotDailyCount, Platform)
        }
    }
}

I use /document/v1 API to push documents into Vespa (POST to put a given document, by ID) https://docs.vespa.ai/en/reference/document-v1-api-reference.html

On my tests on local Vespa it takes arount 2.3 milliseconds to push one document, in a test where i push 100k documents.

I did the same test wit Elastic search and the average time is around 1.7 milliseconds. I am trying to find a way of getting at least the same performance as in ElasticSearch.

Any idea how can i improve my time on each document push?


Solution

  • Did you try using https://docs.vespa.ai/en/vespa-feed-client.html - this is optimized for throughput, and normally the best client to push indexing load. This question was also asked at https://github.com/vespa-engine/vespa/issues/25715, where more answers are found