Trying to understand how the ranking works. My index is defined with "english" analyzers on all the fields.
This is my query:
GET test_index_1/study/_search/
{
"query": {
"multi_match" : {
"query": "stupid question",
"type": "best_fields",
"fields": ["description", "title", "questions.text" ]
}
}
}
Below are the returned results. I have only 3 documents in the test index.
I wonder why the 1st document has twice the score of the 2nd.
Intuitively, "title" and "description" fields are "equal": why match in "title" gives a higher score?
"hits": {
"total": 3,
"max_score": 1.7600523,
"hits": [
{
"_index": "test_index_1",
"_type": "study",
"_id": "AV28gnhD1DC3_uN8bTrd",
"_score": 1.7600523,
"_source": {
"title": "stupid question",
"description": "test test",
"questions": [
{
"text": "stupid text"
}
]
}
},
{
"_index": "test_index_1",
"_type": "study",
"_id": "AV28gomD1DC3_uN8bTre",
"_score": 0.84339964,
"_source": {
"title": "test test",
"description": "stupid question",
"questions": [
{
"text": "stupid text"
}
]
}
},
{
"_index": "test_index_1",
"_type": "study",
"_id": "AV28gpPT1DC3_uN8bTrf",
"_score": 0.84339964,
"_source": {
"title": "test test",
"description": "stupid question",
"questions": [
{
"text": "no text"
}
]
}
}
]
Thanks in advance for any hint.
Elasticsearch is using inverted index and tfidf. So more importance is given to words which occur less in all documents together. Words "stupid" and "question" occur just once in all titles (only in first result), but they occur twice in all descriptions (in second and third result), so the "stupid question" in title is more valuable because it occurs less. That is the reason why is the score bigger in the first document.