Search code examples
elasticsearchprefixn-gram

Bring back all relevant results when using ngrams with elasticsearch


I indexed my elasticsearch index with ngrams to make it possible to do fuzzy matching and prefix searches quickly. I notice that if I search for documents containing "Bob" in the name field, only results name = Bob return. I would like the response to include documents with name=Bob, but also documents with name = Bobbi, Bobbette, etcetera. The Bob results should have a relatively high score. The other results that don't match exactly, should still appear in the results set, but with lower scores. How can I achieve this with ngrams?

I am using a very small simple index to test. The index contains two documents.

 {
    "_index": "contacts_4",
    "_type": "_doc",
    "_id": "1",
    "_score": 1.0,
    "_source": {
      "full_name": "Bob Smith"
    }
  },
  {
    "_index": "contacts_4",
    "_type": "_doc",
    "_id": "2",
    "_score": 1.0,
    "_source": {
      "full_name": "Bobby Smith"
    }
  }

Solution

  • Here is a working example (using n-gram tokenizer):

    ngram-tokenizer

    Mapping

      PUT my_index
      {
      "settings": {
        "analysis": {
          "analyzer": {
            "my_analyzer": {
              "filter": [
                "lowercase"
              ],
              "type": "custom",
              "tokenizer": "my_tokenizer"
            }
          },
          "tokenizer": {
            "my_tokenizer": {
              "token_chars": [
                "letter",
                "digit"
              ],
              "min_gram": "3",
              "type": "ngram",
              "max_gram": "4"
            }
          }
        }
      },
      "mappings": {
        "properties": {
          "full_name": {
            "type": "text",
            "analyzer": "my_analyzer",
            "fields": {
              "keyword": {
                "type": "keyword"
              }
            }
          }
        }
      }
    }
    

    Indexing documents

    POST my_index/_doc/1
    {
      "full_name":"Bob Smith"
    }
    
    POST my_index/_doc/2
    {
      "full_name":"Bobby Smith"
    }
    
    POST my_index/_doc/3
    {
      "full_name":"Bobbette Smith"
    }
    

    Search Query

    GET my_index/_search
    {
      "query": {
        "match": {
          "full_name": "Bob"
        }
      }
    }
    

    Results

    "hits" : [
      {
        "_index" : "my_index",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 0.1626403,
        "_source" : {
          "full_name" : "Bob Smith"
        }
      },
      {
        "_index" : "my_index",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 0.13703513,
        "_source" : {
          "full_name" : "Bobby Smith"
        }
      },
      {
        "_index" : "my_index",
        "_type" : "_doc",
        "_id" : "3",
        "_score" : 0.11085624,
        "_source" : {
          "full_name" : "Bobbette Smith"
        }
      }
    ]
    

    Hope this helps