Search code examples
elasticsearchelastic-stackelasticsearch-pluginelasticsearch-analyzers

How to give higher score to exact searches than phonetic ones in Elasticsearch?


I am currenty using Elasticsearch's phonetic analyzer. I want the query to give higher score to exact matches then phonetic ones. Here is the query I am using:


{
    "query": {
        "multi_match" : {
            "query" : "Abhijeet",
            "fields" : ["content", "title"]




        }
    },         
     "size": 10,
     "_source": [ "title", "bench", "court", "id_" ],
     "highlight": {
        "fields" : {
            "title" : {},
            "content":{}
        }
    }

}


When I search for Abhijeet, the top queries are Abhijit and only later does Abhijeet come. I want the exact matches to appear first, all the time and then the phonetic ones. Can this be done?

Edit:

Mappings

{
    "courts_2": {
        "mappings": {
            "properties": {
                "author": {
                    "type": "text",
                    "analyzer": "my_analyzer"
                },
                "bench": {
                    "type": "text",
                    "analyzer": "my_analyzer"
                },
                "citation": {
                    "type": "text"
                },
                "content": {
                    "type": "text",
                    "analyzer": "my_analyzer"
                },
                "court": {
                    "type": "text"
                },
                "date": {
                    "type": "text"
                },
                "id_": {
                    "type": "text"
                },
                "title": {
                    "type": "text",
                    "analyzer": "my_analyzer"
                },
                "verdict": {
                    "type": "text"
                }
            }
        }
    }
}

Here is the code I used to set up the phonetic analyzer:

{
    "settings": {
        "index": {
            "analysis": {
                "analyzer": {
                    "my_analyzer": {
                        "tokenizer": "standard",
                        "filter": [
                            "lowercase",
                            "my_metaphone"
                        ]
                    }
                },
                "filter": {
                    "my_metaphone": {
                        "type": "phonetic",
                        "encoder": "metaphone",
                        "replace": true
                    }
                }
            }
        }
    },
    "mappings": {
        "properties": {
            "author": {
                "type": "text",
                "analyzer": "my_analyzer"
            },
            "bench": {
                "type": "text",
                "analyzer": "my_analyzer"
            },
            "citation": {
                "type": "text"
            },
            "content": {
                "type": "text",
                "analyzer": "my_analyzer"
            },
            "court": {
                "type": "text"
            },
            "date": {
                "type": "text"
            },
            "id_": {
                "type": "text"
            },
            "title": {
                "type": "text",
                "analyzer": "my_analyzer"
            },
            "verdict": {
                "type": "text"
            }
        }
    }
}

Now, I want to query only the title and the content field. Here, I want the exact matches to appear first and then the phonetic ones.


Solution

  • The general solution approach is:

    • to use a bool-query,
    • with your ponectic query/queries in the must clause,
    • and the non-phonetic query/queries in the should clause

    I can update the answer if you include the mappings and settings of your index to your question.

    Update: Solution Approach

    A. Expand your mapping to use multi-fields for title and content:

    "title": {
      "type": "text",
      "analyzer": "my_analyzer",
      "fields" : {
        "standard" : {
          "type" : "text"
        }
      }
    },
    ...
    "content": {
      "type": "text",
      "analyzer": "my_analyzer"
      "fields" : {
        "standard" : {
          "type" : "text"
        }
      }
    },
    

    B. Get the fields populated (e.g. by re-indexing everything):

    POST courts_2/_update_by_query
    

    C. Adjust your query to leverage the newly introduced fields:

    GET courts_2/_search
    {
      "_source": ["title","bench","court","id_"],
      "size": 10,
      "query": {
        "bool": {
          "must": {
            "multi_match": {
              "query": "Abhijeet",
              "fields": ["title", "content"]
            }
          },
          "should": {
            "multi_match": {
              "query": "Abhijeet",
              "fields": ["title.standard", "content.standard"]
            }
          }
        }
      },
      "highlight": {
        "fields": {
          "title": {},
          "content": {}
        }
      }
    }