I'm working on AWS Elastic Search. I've come across one situation in my project where in my reports i have to search keywords like "corona virus".
But result should come with containing keywords like "Corona virus" and "corona" and "virus" and "coronavirus".
Please guide me how i should build my query DSL.
Note: Working on PHP language.
Appreciate your help.
//Amit
You need to use shingle token filter
A token filter of type shingle that constructs shingles (token n-grams) from a token stream. In other words, it creates combinations of tokens as a single token. For example, the sentence "please divide this sentence into shingles" might be tokenized into shingles "please divide", "divide this", "this sentence", "sentence into", and "into shingles".
Mapping
PUT index91
{
"settings": {
"analysis": {
"analyzer": {
"my_analyzer": {
"tokenizer": "standard",
"filter": [
"lowercase",
"shingle_filter"
]
}
},
"filter": {
"shingle_filter": {
"type": "shingle",
"min_shingle_size": 2,
"max_shingle_size": 3,
"output_unigrams": true,
"token_separator": ""
}
}
}
},
"mappings": {
"properties": {
"title": {
"type": "text",
"analyzer": "my_analyzer"
}
}
}
}
Data:
POST index91/_doc
{
"title":"corona virus"
}
Query:
GET index91/_search
{
"query": {
"match": {
"title": "coronavirus"
}
}
}
Result:
"hits" : [
{
"_index" : "index91",
"_type" : "_doc",
"_id" : "gNmUZHEBrJsHVOidaoU_",
"_score" : 0.9438393,
"_source" : {
"title" : "corona virus"
}
}
It will also work for "corona", "corona virus","virus"