I would like to implement an intranet site search with the help of Elasticsearch but i can't find the query formula that will answer all my needs.
Here's the criterias that i would like to apply to my search when searching for 2+ words:
Here's a demo of my search query where you can play online: https://www.found.no/play/gist/6df91cb4ed1f2b4b7328
When i do search for "toll collector", i get the result in that order:
But why the exact match is in the third place? Why not in the first position? What i want is this result:
Your query doesn't take word order into account.
To do so, you need to add "type": "phrase"
to your query. This does the same thing as replacing "match" by "match_phrase".
You then get a single document, your desired #1.
To allow in-between words, you add "slop": 2
You then get the first three desired documents, in the right order. But the "fuzziness" parameter seems to have no effect in phrase mode.
To also get the "connector" answers, you can group the two queries in a "should" clause :
query:
bool:
should:
- match_phrase:
description:
query: "toll collector"
slop: 2
- match:
description:
query: "toll collector"
fuzziness: 2
This adds the "connector" answers, but their score does not take the in-between words into account.
To do so, you would need some kind of distance score that encapsulates both phrase sloppiness and word fuzziness. It don't know if this is implemented, but if it exists, it's going to be computationally expensive for order-2 edits on both sides.