I'm trying to implement a search system in which I need to use Edge NGRAM Tokenizer. Settings for creating an index are shown below. I have used same tokenizer for both documents and search query. (Documents are in Perisan language)
PUT /test
{
"settings": {
"analysis": {
"analyzer": {
"autocomplete": {
"tokenizer": "autocomplete",
"filter": [
"lowercase"
]
},
"autocomplete_search": {
"tokenizer": "autocomplete"
}
},
"tokenizer": {
"autocomplete": {
"type": "edge-ngram",
"min_gram": 2,
"max_gram": 10,
"token_chars": [
"letter"
]
}
}
}
},
"mappings": {
"_doc": {
"properties": {
"title": {
"type": "text",
"analyzer": "autocomplete",
"search_analyzer": "autocomplete_search"
}
}
}
}
}
The problem shows up when I get 0 hits (results) from searching term 'آلمانی' in docs while I have a doc with data : 'آلمان خوب است'.
As you can see the result for analyzing term 'آلمانی' shows that it generates token 'آلمان' and works properly.
{
"tokens" : [
{
"token" : "آ",
"start_offset" : 0,
"end_offset" : 6,
"type" : "<ALPHANUM>",
"position" : 0
},
{
"token" : "آل",
"start_offset" : 0,
"end_offset" : 6,
"type" : "<ALPHANUM>",
"position" : 0
},
{
"token" : "آلم",
"start_offset" : 0,
"end_offset" : 6,
"type" : "<ALPHANUM>",
"position" : 0
},
{
"token" : "آلما",
"start_offset" : 0,
"end_offset" : 6,
"type" : "<ALPHANUM>",
"position" : 0
},
{
"token" : "آلمان",
"start_offset" : 0,
"end_offset" : 6,
"type" : "<ALPHANUM>",
"position" : 0
},
{
"token" : "آلمانی",
"start_offset" : 0,
"end_offset" : 6,
"type" : "<ALPHANUM>",
"position" : 0
}
]
}
The searching query shown below gets 0 hits.
GET /test/_search
{
"query": {"match": {
"title": {"query": "آلمانی" , "operator": "and"}
}}
}
However searching term 'آلما' returns doc with data 'آلمان خوب است'. How can I fix this problem?
Your assistance would be greatly appreciated.
I found this DevTicks post by Ricardo Heck which solved my problem. enter the link for more detailed description
I changed my mapping setting like this:
"mappings": {
"_doc": {
"properties": {
"title": {
"type": "text",
"analyzer": "autocomplete",
"search_analyzer": "autocomplete_search",
"fields": {
"ngram": {
"type": "text",
"analyzer": "autocomplete"
}
}
}
}
}
}
And now I get doc "آلمان خوب است" by searching the term "آلمانی".