I am trying to write an Elasticsearch multi-match query (with the Java API) to create a "search-as-you-type" program. The query is applied to two fields, title
and description
, which are analyzed as ngrams.
My problem is, it seems that Elasticsearch tries to find only words beginning like my query. For instance, if I search for "nut", then it matches with documents featuring "nut", "nuts", "Nutella", etc, but it does not match documents featuring "walnut", which should be matched.
Here are my settings :
{
"index": {
"analysis": {
"analyzer": {
"edgeNGramAnalyzer": {
"tokenizer": "edgeTokenizer",
"filter": [
"word_delimiter",
"lowercase",
"unique"
]
}
},
"tokenizer": {
"edgeTokenizer": {
"type": "edgeNGram",
"min_gram": "3",
"max_gram": "8",
"token_chars": [
"letter",
"digit"
]
}
}
}
}
}
Here is the relevant part of my mapping :
{
"content": {
"properties": {
"title": {
"type": "text",
"analyzer": "edgeNGramAnalyzer",
"fields": {
"sort": {
"type": "keyword"
}
}
},
"description": {
"type": "text",
"analyzer": "edgeNGramAnalyzer",
"fields": {
"sort": {
"type": "keyword"
}
}
}
}
}
}
And here is my query :
new MultiMatchQueryBuilder(query).field("title", 3).field("description", 1).fuzziness(0).tieBreaker(1).minimumShouldMatch("100%")
Do you have any idea what I could be doing wrong ?
That's because you're using an edgeNGram
tokenizer instead of nGram
one. The former only indexes prefixes, while the latter will index prefixes, suffixes and also sub-parts of your data.
Change your analyzer definition to this instead and it should work as expected:
{
"index": {
"analysis": {
"analyzer": {
"edgeNGramAnalyzer": {
"tokenizer": "edgeTokenizer",
"filter": [
"word_delimiter",
"lowercase",
"unique"
]
}
},
"tokenizer": {
"edgeTokenizer": {
"type": "nGram", <---- change this
"min_gram": "3",
"max_gram": "8",
"token_chars": [
"letter",
"digit"
]
}
}
}
}
}