When indexing in local Vespa, the indexing is slow.
My configuration: `
<container id="default" version="1.0">
<search />
<document-api />
<nodes>
<node hostalias="node1" />
</nodes>
</container>
<content id="bo" version="1.0">
<redundancy>1</redundancy>
<documents>
<document type="psearch" mode="index" />
</documents>
<nodes>
<node hostalias="node1" distribution-key="0" />
</nodes>
</content>
`
and schema:
schema psearch {
document psearch {
field Id type int {
indexing: summary | attribute
attribute: fast-search
}
field Name type string {
indexing: summary | index | attribute
index: enable-bm25
}
field AdId type string {
indexing: summary | index | attribute
index: enable-bm25
}
field Country type string {
indexing: summary | index | attribute
index: enable-bm25
}
field Avatar type string {
indexing: summary | index | attribute
index: enable-bm25
}
field Value type long {
indexing: summary | attribute
attribute: fast-search
}
field Numbers type int {
indexing: summary | attribute
attribute: fast-search
}
field BotLastTime type long {
indexing: summary | attribute
attribute: fast-search
}
field BotDailyCount type int {
indexing: summary | attribute
attribute: fast-search
}
field Platform type string {
indexing: summary | index | attribute
index: enable-bm25
}
}
fieldset default {
fields: Id, Name, AdId, Country, Avatar, Numbers, BotLastTime, BotDailyCount, Platform
}
rank-profile default {
first-phase {
expression: nativeRank(Id, Name, AdId, Country, Avatar, Numbers, BotLastTime, BotDailyCount, Platform)
}
}
}
I use /document/v1 API to push documents into Vespa (POST to put a given document, by ID) https://docs.vespa.ai/en/reference/document-v1-api-reference.html
On my tests on local Vespa it takes arount 2.3 milliseconds to push one document, in a test where i push 100k documents.
I did the same test wit Elastic search and the average time is around 1.7 milliseconds. I am trying to find a way of getting at least the same performance as in ElasticSearch.
Any idea how can i improve my time on each document push?
Did you try using https://docs.vespa.ai/en/vespa-feed-client.html - this is optimized for throughput, and normally the best client to push indexing load. This question was also asked at https://github.com/vespa-engine/vespa/issues/25715, where more answers are found