Search code examples
elasticsearchsearchkick

ElasticSearch - Fuzzy and strict match with multiple fields


We want to leverage ElasticSearch to find us similar objects.

Lets say I have an Object with 4 fields: product_name, seller_name, seller_phone, platform_id.

Similar products can have different product names and seller names across different platforms (fuzzy match).

While, phone is strict and a single variation might cause yield a wrong record (strict match).

What were trying to create is a query that will:

  1. Take into account all fields we have for current record and OR between them.
  2. Mandate platform_id is the one I want to specific look at. (AND)
  3. Fuzzy the product_name and seller_name
  4. Strictly match the phone number or ignore it in the OR between the fields.

If I would write it in pseudo code, I would write something like:

((product_name like 'some_product_name') OR (seller_name like 'some_seller_name') OR (seller_phone = 'some_phone')) AND (platform_id = 123)


Solution

  • To do exact match on seller_phone i am indexing this field without ngram analyzers along with fuzzy_query for product_name and seller_name

    Mapping

    PUT index111
    {
      "settings": {
        "analysis": {
          "analyzer": {
            "edge_n_gram_analyzer": {
              "tokenizer": "whitespace",
              "filter" : ["lowercase",  "ednge_gram_filter"]
            }
          },
          "filter": {
          "ednge_gram_filter" : {
            "type" : "NGram",
            "min_gram" : 2,
            "max_gram": 10
          }
          }
        }
      },
      "mappings": {
        "document_type" : {
          "properties": {
            "product_name" : {
              "type": "text",
              "analyzer": "edge_n_gram_analyzer"
            },
            "seller_name" : {
              "type": "text",
              "analyzer": "edge_n_gram_analyzer"
            },
            "seller_phone" : {
              "type": "text"
            },
            "platform_id" : {
              "type": "text"
            }
          }
        }
      }
    }
    

    Index documents

    POST index111/document_type
    {
           "product_name":"macbok",
           "seller_name":"apple",
           "seller_phone":"9988",
           "platform_id":"123"
    }
    

    For following pseudo sql query

    ((product_name like 'some_product_name') OR (seller_name like 'some_seller_name') OR (seller_phone = 'some_phone')) AND (platform_id = 123)
    

    Elastic Query

    POST index111/_search
    {
        "query": {
            "bool": {
                "must": [
                  {
                    "term": {
                      "platform_id": {
                        "value": "123"
                      }
                    }
                  },
                  {
                    "bool": {
                        "should": [{
                                "fuzzy": {
                                    "product_name": {
                                        "value": "macbouk",
                                        "boost": 1.0,
                                        "fuzziness": 2,
                                        "prefix_length": 0,
                                        "max_expansions": 100
                                    }
                                }
                            },
                            {
                                "fuzzy": {
                                    "seller_name": {
                                        "value": "apdle",
                                        "boost": 1.0,
                                        "fuzziness": 2,
                                        "prefix_length": 0,
                                        "max_expansions": 100
                                    }
                                }
                            },
                            {
                              "term": {
                                "seller_phone": {
                                  "value": "9988"
                                }
                              }
                            }
                        ]
                    }
                }]
            }
        }
    }
    

    Hope this helps