We have a cluster consisting of 3 masters (4 core, 16 GB RAM each), 3 hot(8 core, 32 GB RAM, 300 GB SSD each), and 3 warm nodes(8 core, 32GB RAM, 1.5TB HDD each).
We have one index for each month of year following the naming convention of voucher_YYYY_MMM(eg voucher_2021_JAN)
. and all these indexes have an alias voucher
which acts as a read alias and our search query is directed towards this read alias.
Our index resides on the hot nodes for 32 days, and that is the period it will receive 99% of writes. Our estimate data is approximately 480 million docs in this index, it has 1 replica and 16 shards( we have taken 16 shards because eventually, our data will grow, right now we are thinking of shrinking down to 8 shards each shard with 30 GB of data, as per our mapping 2 million docs are taking 1GB of space).
After 32 days index will move to the warm nodes, currently, we have 450 million in our hot index and 1.8 billion documents collectively in our warm indexes. The total comes up to 2.25 billion docs.
Our doc contains customer id and some fields on which we are applying filters, they all are mapped as keyword types, we are using custom routing on customer id
for improving our search speed.
our typical query looks like
GET voucher/_search?routing=1000636779&search_type=query_then_fetch
{
"from": 0,
"size": 20,
"query": {
"constant_score": {
"filter": {
"bool": {
"filter": [
{
"term": {
"uId": {
"value": "1000636779",
"boost": 1
}
}
},
{
"terms": {
"isGift": [
"false"
]
}
}
]
}
}
}
},
"version": true,
"sort": [
{
"cdInf.crtdAt": {
"order": "desc"
}
}
]
}
We are using a constant score query because we don't want to score our documents and want to increase search speed.
We have 13 search threads on each of our hot and warm nodes and we are sending requests to our master node for indexing and searching.
we are sending 100 search requests per second and getting an average search response time of about 3.5 seconds, where max time is going up to 9 seconds.
I am not understanding what are we missing, why is our search performance so poor.
Thank you for the exhaustive explanations. Based on them here are a few points of improvement (in no particular order):
cdInf.crtdAt
field. Faster searches at the cost of slower ingestion, but it only makes sense if your queries have a time constraint, otherwise not.voucher
alias, but that would also be a good information to have in order to assess whether the sharding and size of search threads is appropriate. Based on the docs count you provide, it seems you have 1 hot index and 5 warm ones, so 6 indexes in total. So each search request with routing will search only 6 shards.preference
query string parameter