I have two documents in my index. One contains field :
name: foo bar
and another
name: foo xyz bar xyz foo xyz bar xyz foo xyz bar xyz foo xyz bar
I'm using ngrams analyzer like this:
"analysis": {
"analyzer": {
"ngram_analyzer": {
"tokenizer": "ngram_tokenizer"
}
},
"tokenizer": {
"ngram_tokenizer": {
"type": "ngram",
"min_gram": 3,
"max_gram": 3,
"token_chars": [
"letter",
"digit",
"whitespace"
]
}
}
}
and when I search for foo bar
first document gets higher score then second. This is what I want but can anybody explain how does this scoring work? as I know ngram splits them in 3 character length terms and how does it founds out that foo
and bar
are in sequence in first document and assigns to it higher score?
Relevance/scoring in Elasticsearch is not the easiest part when you are starting. Score calculation is based on three main parts:
Shortly:
I recommend you to read below materials:
But additionally score will depend on type of query you are using. For example for match
query foo bar
search term better suits the foo bar
document than the second one.