Search code examples
elasticsearchfuzzy

ElasticSearch - Unable To Search Using Fuzzy Match Query For Underscore in value (ES Fuzzy not matching underscore value)


Suppose I have three documents in my elasticsearch. For Ex:

1: {
    "name": "test_2602"
   }
2: {
    "name": "test-2602"
   }
3: {
    "name": "test 2602"
   }

Now when I search it using fuzzy match query as given below

{
  "query": {
    "bool": {
      "must": [
        {
          "bool": {
            "must": [
              {
                "match": {
                  "name": {
                    "query": "test-2602",
                    "fuzziness": "2",
                    "prefix_length": 0,
                    "max_expansions": 50,
                    "fuzzy_transpositions": true,
                    "lenient": false,
                    "zero_terms_query": "NONE",
                    "boost": 1
                  }
                }
              }
            ],
            "disable_coord": false,
            "adjust_pure_negative": true,
            "boost": 1
          }
        }
      ],
      "disable_coord": false,
      "adjust_pure_negative": true,
      "boost": 1
    }
  }
}

In response I am only getting two documents which is (even if I search by name value as => "test", "test 2602" or "test-2602")

  {
    "name": "test-2602"
  },
  {
    "name": "test 2602"
  }

I am not getting document with name as "test_2602" (not matching with value which contains underscore). I want it to include third document as well with name value as "test_2602". But If I search for name as "test_2602" then in response I get

 {
   "name": "test_2602"
 }

I need to fetch all three documents whenever I search name as "test", "test 2602", "test-2602" and "test_2602"


Solution

  • You are getting only two documents in your search because by default elasticsearch uses a standard analyzer, which will tokenize "test-2602" and "test 2602" into test and 2602. But "test_2602" will not be tokenized.

    You can check the tokens generated by using analyze API:

    GET /_analyze
    
    {
      "analyzer" : "standard",
      "text" : "test_2602"
    }
    

    The token generated will be:

    {
      "tokens": [
        {
          "token": "test_2602",
          "start_offset": 0,
          "end_offset": 9,
          "type": "<ALPHANUM>",
          "position": 0
        }
      ]
    }
    

    You need to add .keyword to the type field. This uses the keyword analyzer instead of the standard analyzer (notice the .keyword after name field). Try out the following query.

    Index Mapping:

    {
      "mappings": {
        "properties": {
          "name": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword"
              }
            }
          }
        }
      }
    }
    

    Search Query:

    {
      "query": {
        "match": {
          "name.keyword": {
            "query": "test_2602",
            "fuzziness":2
          }
        }
      }
    }
    

    Search Result:

    "hits": [
          {
            "_index": "66572330",
            "_type": "_doc",
            "_id": "1",
            "_score": 0.9808291,
            "_source": {
              "name": "test_2602"
            }
          },
          {
            "_index": "66572330",
            "_type": "_doc",
            "_id": "3",
            "_score": 0.8718481,
            "_source": {
              "name": "test 2602"
            }
          },
          {
            "_index": "66572330",
            "_type": "_doc",
            "_id": "2",
            "_score": 0.8718481,
            "_source": {
              "name": "test-2602"
            }
          }
        ]