Search code examples
indexingtokenizeelasticsearchkibana-4

Elasticsearch - Index Mapping settings for both exact and partial matching


I'm new to elasticsearch and am trying to learn how to index using optimal mapping settings to achieve the following.

If I have a document like this

{"name":"Galapagos Islands"}

I want to get this a result for both the following queries

1) Partial matching

{
    "query": {
        "match": {
            "name": "ga"
        }
    }
}

2) Exact matching

{
    "query": {
        "term": {
            "name": "Galapagos Islands"
        }
    }
}

With the setting I have currently. I am able to achieve the partial matching part. But exact matching returns no results. Please find below the settings with which I indexed.

{
  "mappings": {
        "islands": {
            "properties": {
                "name":{
                    "type": "string",
                    "index_analyzer": "autocomplete",
                    "search_analyzer": "search_ngram"
                }
            }
        }
    },

  "settings":{
    "analysis":{
      "analyzer":{
        "autocomplete":{
          "type":"custom",
          "tokenizer":"standard",
          "filter":[ "standard", "lowercase", "stop", "kstem", "ngram" ] 
        },
        "search_ngram": {
          "type": "custom",
          "tokenizer": "keyword",
          "filter": "lowercase"
        }
      },
      "filter":{
        "ngram":{
          "type":"ngram",
          "min_gram":2,
          "max_gram":15
        }
      }
    }
  }
}

What is the correct way to do exact matching and partial matching on a field ?

UPDATE

After recreating the index with settings given below. My mappings look like this

curl -XGET 'localhost:9200/testing/_mappings?pretty'
{
  "testing" : {
    "mappings" : {
      "islands" : {
        "properties" : {
          "name" : {
            "type" : "string",
            "index_analyzer" : "autocomplete",
            "search_analyzer" : "search_ngram",
            "fields" : {
              "raw" : {
                "type" : "string",
                "analyzer" : "my_keyword_lowercase_analyzer"
              }
            }
          }
        }
      }
    }
  }
}

My indexing settings are the below

{
  "mappings": {
        "islands": {
            "properties": {
                "name":{
                    "type": "string",
                    "index_analyzer": "autocomplete",
                    "search_analyzer": "search_ngram",
                    "fields": {
                      "raw": {
                          "type": "string",
                          "analyzer": "my_keyword_lowercase_analyzer"
                      }
                    }
                }
            }
        }
    },

  "settings":{
    "analysis":{
      "analyzer":{
        "autocomplete":{
          "type":"custom",
          "tokenizer":"standard",
          "filter":[ "standard", "lowercase", "stop", "kstem", "ngram" ] 
        },
        "search_ngram": {
          "type": "custom",
          "tokenizer": "keyword",
          "filter": "lowercase"
        },
        "my_keyword_lowercase_analyzer": {
          "type": "custom",
          "filter": ["lowercase"],
          "tokenizer": "keyword"
        }
      },
      "filter":{
        "ngram":{
          "type":"ngram",
          "min_gram":2,
          "max_gram":15
        }
      }
    }
  }
}

And with all the above, when I query like this

curl -XGET 'localhost:9200/testing/islands/_search?pretty' -d '{"query": {"term": {"name.raw" : "Galapagos Islands"}}}'
{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 0,
    "max_score" : null,
    "hits" : [ ]
  }
}

And My document is this

curl -XGET 'localhost:9200/testing/islands/1?pretty'
{
  "_index" : "testing",
  "_type" : "islands",
  "_id" : "1",
  "_version" : 1,
  "found" : true,
  "_source":{"name":"Galapagos Islands"}
}

Solution

  • Add a subfield to your name property which should be not_analyzed. Or, if you care about lowercase/uppercase, a keyword tokenizer together with a lowercase filter.

    This should index Galapagos as is, not modifications. Then you can do your term search.

    For example, a keyword analyzer together with lowercase filter:

        "my_keyword_lowercase_analyzer": {
          "type": "custom",
          "filter": [
            "lowercase"
          ],
          "tokenizer": "keyword"
        }
    

    And the mapping:

            "properties": {
                "name":{
                    "type": "string",
                    "index_analyzer": "autocomplete",
                    "search_analyzer": "search_ngram",
                    "fields": {
                        "raw": {
                            "type": "string",
                            "analyzer": "my_keyword_lowercase_analyzer"
                        }
                    }
                }
            }
    

    The query to be used is:

    {
        "query": {
            "term": {
                "name.raw": "galapagos islands"
            }
        }
    }
    

    So, instead of using the same field - name - you should be using name.raw (the subfield).