Search code examples
amazon-web-serviceselasticsearchdslquerydsl

AWS Elastic search : Search should be performed on all combination with given query


I'm working on AWS Elastic Search. I've come across one situation in my project where in my reports i have to search keywords like "corona virus".

But result should come with containing keywords like "Corona virus" and "corona" and "virus" and "coronavirus".

Please guide me how i should build my query DSL.

Note: Working on PHP language.

Appreciate your help.

//Amit


Solution

  • You need to use shingle token filter

    A token filter of type shingle that constructs shingles (token n-grams) from a token stream. In other words, it creates combinations of tokens as a single token. For example, the sentence "please divide this sentence into shingles" might be tokenized into shingles "please divide", "divide this", "this sentence", "sentence into", and "into shingles".

    Mapping

    PUT index91
    {
      "settings": {
        "analysis": {
          "analyzer": {
            "my_analyzer": {
              "tokenizer": "standard",
              "filter": [
                "lowercase",
                "shingle_filter"
              ]
            }
          },
          "filter": {
            "shingle_filter": {
              "type": "shingle",
              "min_shingle_size": 2,
              "max_shingle_size": 3,
              "output_unigrams": true,
               "token_separator": ""
            }
          }
        }
      },
      "mappings": {
        "properties": {
          "title": {
            "type": "text",
            "analyzer": "my_analyzer"
          }
        }
      }
    }
    
    

    Data:

    POST index91/_doc
    {
      "title":"corona virus"
    }
    

    Query:

    GET index91/_search
    {
      "query": {
        "match": {
          "title": "coronavirus"
        }
      }
    }
    

    Result:

    "hits" : [
          {
            "_index" : "index91",
            "_type" : "_doc",
            "_id" : "gNmUZHEBrJsHVOidaoU_",
            "_score" : 0.9438393,
            "_source" : {
              "title" : "corona virus"
            }
          }
    

    It will also work for "corona", "corona virus","virus"