Search code examples
elasticsearchfuzzy-searchsearch-suggestion

Elasticsearch fuzzy search phrase with dash


I am trying to find a way to index a document with a description like "In-N-Out Burger" and do a search like "in n out" or "in and out" or just straight "in-n-out" and have it return the "In-N-Out Burger" document. Looking through documents I am confused on how to handle a dash while indexing or searching. Any suggestions?

My current setting and mapping:

curl -XPUT http://localhost:9200/objects -d '{
    "settings": {
        "analysis": {
            "analyzer": {
                "lower": {
                    "type": "custom",
                    "tokenizer": "keyword",
                    "filter": [ "lowercase" ] 
                }
            }
        }
    }
}'

curl -XPUT http://localhost:9200/objects/object/_mapping -d '{
    "object" : {
        "properties" : {
            "objectDescription" : {
                "type" : "string",
                "fields" : {
                    "lower": {
                        "type": "string",
                        "analyzer": "lower"
                    }
                }
            },
            "suggest" : {
                "type" : "completion",
                "analyzer" : "simple",
                "search_analyzer" : "simple",
                "payloads" : true
            }
        }
    }
}'

Solution

  • I haven't seen any issues when I made index with your settings and put document:

    curl -XPUT http://localhost:9200/objects/object/001 -d '{
      "description": "In-N-Out Burger",
      "name" : "first_document"
    }'
    

    And then tried to find it:

    curl -XGET 'localhost:9200/objects/object/_search?q=in+and+out&pretty'
    {
      "took" : 6,
      "timed_out" : false,
      "_shards" : {
        "total" : 5,
        "successful" : 5,
        "failed" : 0
      },
      "hits" : {
        "total" : 1,
        "max_score" : 0.05038611,
        "hits" : [ {
          "_index" : "objects",
          "_type" : "object",
          "_id" : "001",
          "_score" : 0.05038611,
          "_source" : {
            "description" : "In-N-Out Burger",
            "name" : "first_document"
          }
        } ]
      }
    }
    

    or

    curl -XGET 'localhost:9200/objects/object/_search?pretty&q=in-n-out'
    {
      "took" : 8,
      "timed_out" : false,
      "_shards" : {
        "total" : 5,
        "successful" : 5,
        "failed" : 0
      },
      "hits" : {
        "total" : 1,
        "max_score" : 0.23252454,
        "hits" : [ {
          "_index" : "objects",
          "_type" : "object",
          "_id" : "001",
          "_score" : 0.23252454,
          "_source" : {
            "description" : "In-N-Out Burger",
            "name" : "first_document"
          }
        } ]
      }
    }
    

    As you can see it can be found. Analyzer uses '-' as delimiter and divides phrase on tokens when you index document and when you try to find it. You can see this work:

    curl -XGET 'localhost:9200/objects/_analyze?pretty=true' -d 'In-N-Out Burger'
    {
      "tokens" : [ {
        "token" : "in",
        "start_offset" : 0,
        "end_offset" : 2,
        "type" : "<ALPHANUM>",
        "position" : 0
      }, {
        "token" : "n",
        "start_offset" : 3,
        "end_offset" : 4,
        "type" : "<ALPHANUM>",
        "position" : 1
      }, {
        "token" : "out",
        "start_offset" : 5,
        "end_offset" : 8,
        "type" : "<ALPHANUM>",
        "position" : 2
      }, {
        "token" : "burger",
        "start_offset" : 9,
        "end_offset" : 15,
        "type" : "<ALPHANUM>",
        "position" : 3
      } ]
    }