elasticsearch partial scoring exact-match

ElasticSearch: Partial/Exact Scoring with edge_ngram & fuzziness

In ElasticSearch I am trying to get correct scoring using edge_ngram with fuzziness. I would like exact matches to have the highest score and sub matches have lesser scores. Below is my setup and scoring results.

settings: {
          number_of_shards: 1,
          analysis: {
             filter: {
                ngram_filter: {
                   type: 'edge_ngram',
                   min_gram: 2,
                   max_gram: 20
                }
             },
             analyzer: {
                ngram_analyzer: {
                   type: 'custom',
                   tokenizer: 'standard',
                   filter: [
                      'lowercase',
                      'ngram_filter'
                   ]
                }
             }
          }
       },
    mappings: [{
          name: 'voter',
          _all: {
                'type': 'string',
                'index_analyzer': 'ngram_analyzer',
                'search_analyzer': 'standard'
             },
             properties: {
                last: {
                   type: 'string',
                   required : true,
                   include_in_all: true,
                   term_vector: 'yes',
                   index_analyzer: 'ngram_analyzer',
                   search_analyzer: 'standard'
                },
                first: {
                   type: 'string',
                   required : true,
                   include_in_all: true,
                   term_vector: 'yes',
                   index_analyzer: 'ngram_analyzer',
                   search_analyzer: 'standard'
                },

             }

       }]

After doing a POST with first name "Michael" I do a query as below with changes "Michael", "Michae", "Micha", "Mich", "Mic", and "Mi".

GET voter/voter/_search
{
 "query": {
    "match": {
      "_all": {
        "query": "Michael",
        "fuzziness": 2,
        "prefix_length": 1
      }
    }
  }
}

My score results are:

-"Michael": 0.19535106
-"Michae": 0.2242768
-"Micha": 0.24513611
-"Mich": 0.22340237
-"Mic": 0.21408978
-"Mi": 0.15438235

As you can see the score results aren't getting as expected. I would like "Michael" to have the highest score and "Mi" to have the lowest

Any help would be appreciated!

Solution

One way to approach this problem would be to add raw version of text in your mapping like this

                   last: {
                       type: 'string',
                       required : true,
                       include_in_all: true,
                       term_vector: 'yes',
                       index_analyzer: 'ngram_analyzer',
                       search_analyzer: 'standard',
                       "fields": {
                            "raw": { 
                               "type":  "string"  <--- index with standard analyzer
                              }
                          }
                    },
                    first: {
                       type: 'string',
                       required : true,
                       include_in_all: true,
                       term_vector: 'yes',
                       index_analyzer: 'ngram_analyzer',
                       search_analyzer: 'standard',
                       "fields": {
                            "raw": { 
                               "type":  "string"  <--- index with standard analyzer
                              }
                          }
                    },

You could also make it exact with index : not_analyzed

Then you can query like this

{
  "query": {
    "bool": {
      "should": [
        {
          "match": {
            "_all": {
              "query": "Michael",
              "fuzziness": 2,
              "prefix_length": 1
            }
          }
        },
        {
          "match": {
            "last.raw": {
              "query": "Michael",
              "boost": 5
            }
          }
        },
        {
          "match": {
            "first.raw": {
              "query": "Michael",
              "boost": 5
            }
          }
        }
      ]
    }
  }
}

Documents that matches more clauses will be scored higher. You could specify boost according to your requirements.