Search code examples
elasticsearchtokenelastic-stacktokenize

Elastic Search - Apply appropriate analyser to accurate result


I am new in Elastic Search. I would like to apply any analyser that satisfy below search. Lets take an example. Suppose I have entered below text in a document

  1. I am walking now
  2. I walked to Ahmedabad
  3. Everyday I walk in the morning
  4. Anil walks in the evening.
  5. I am hiring candidates
  6. I hired candidates
  7. Everyday I hire candidates
  8. He hires candidates

Now when I search with

  1. text "walking" result should be [walking, walked, walk, walks]
  2. text "walked" result should be [walking, walked, walk, walks]
  3. text "walk" result should be [walking, walked, walk, walks]
  4. text "walks" result should be [walking, walked, walk, walks]

Same result should also for hire.

  1. text "hiring" result should be [hiring, hired, hire, hires]
  2. text "hired" result should be [hiring, hired, hire, hires]
  3. text "hire" result should be [hiring, hired, hire, hires]
  4. text "hires" result should be [hiring, hired, hire, hires]

Thank You,


Solution

  • You need to use stemmer token filter

    Stemming is the process of reducing a word to its root form. This ensures variants of a word match during a search.

    For example, walking and walked can be stemmed to the same root word: walk. Once stemmed, an occurrence of either word would match the other in a search.

    Mapping

    PUT index36
    {
      "mappings": {
        "properties": {
          "title":{
            "type": "text",
            "analyzer": "my_analyzer"
          }
        }
      }, 
      "settings": {
        "analysis": {
          "analyzer": {
            "my_analyzer": {
              "tokenizer": "whitespace",
              "filter": [ "stemmer" ,"lowercase"]
            }
          }
        }
      }
    }
    

    Analyze

    GET index36/_analyze
    {
      "text": ["walking", "walked", "walk", "walks"],
      "analyzer": "my_analyzer"
    }
    

    Result

    {
      "tokens" : [
        {
          "token" : "walk",
          "start_offset" : 0,
          "end_offset" : 7,
          "type" : "word",
          "position" : 0
        },
        {
          "token" : "walk",
          "start_offset" : 8,
          "end_offset" : 14,
          "type" : "word",
          "position" : 101
        },
        {
          "token" : "walk",
          "start_offset" : 15,
          "end_offset" : 19,
          "type" : "word",
          "position" : 202
        },
        {
          "token" : "walk",
          "start_offset" : 20,
          "end_offset" : 25,
          "type" : "word",
          "position" : 303
        }
      ]
    }
    
    

    All the four words produce same token "walk". So any of these words would match the other in a search.