Search code examples
node.jselasticsearchsearchquery-stringopensearch

Not getting specific search result on 'query_string' ElasticSearch


The following company details are indexed in my elastic search

[{
    "_index": "test_company",
    "_type": "_doc",
    "_id": "9303000d-b167-4d7c-b233-43a43c4a7e60",
    "_score": 2.4849067,
    "_source": {
        "id": "9303000d-b167-4d7c-b233-43a43c4a7e60",
        "name": "My Property Company",
        "phone": "11484473106",
        "email": "[email protected]",
        "website": "https://www.walmart.com",
        "ein": "731087415",
        "address": "Apt. 837 5371 Carroll Estates, Zboncakbury, LA 53957-3670",
        "zip": "99950",
        "city": "Ketchikan",
        "state": "Alaska",
        "country": "US",
        "fileId": "",
        "status": "Active"
    }
},
{
    "_index": "test_company",
    "_type": "_doc",
    "_id": "f302ef6a-f259-4980-ad1e-48de33e34799",
    "_score": 1.8173966,
    "_source": {
        "byPhone": "byPhone",
        "website": "https://www.walmart.com",
        "zip": "99950",
        "status": "Active",
        "ein": "427297769",
        "createdAt": "2022-11-11T04:39:50.717Z",
        "isDeleted": false,
        "address": "056 Robin Island, Arliefurt, MA 28632-8802",
        "byName": "byName",
        "email": "[email protected]",
        "country": "US",
        "name": "Walmart",
        "state": "Alaska",
        "city": "Ketchikan",
        "byCreatedAt": "byCreatedAt",
        "fileId": "9b705ead-df91-424c-bad9-ccf1db477f3c",
        "updatedAt": "2022-11-11T05:49:02.759Z",
        "byEmail": "byEmail",
        "id": "f302ef6a-f259-4980-ad1e-48de33e34799",
        "phone": "14844731064"
    }
},
{
    "_index": "test_company",
    "_type": "_doc",
    "_id": "270c1a72-0f54-4004-8efc-70fd3adc0b3d",
    "_score": 1.8173966,
    "_source": {
        "id": "270c1a72-0f54-4004-8efc-70fd3adc0b3d",
        "name": "Walmart",
        "phone": "14844731064",
        "email": "[email protected]",
        "website": "https://www.walmart.com",
        "ein": "058107938",
        "address": "0440 Neville Camp, Schinnerbury, AK 30194-5833",
        "zip": "99950",
        "city": "Ketchikan",
        "state": "Alaska",
        "country": "US",
        "fileId": "",
        "status": "Active"
    }
},
{
    "_index": "test_company",
    "_type": "_doc",
    "_id": "b695acb2-980a-459d-88d8-e16fc47eead8",
    "_score": 1.5686159,
    "_source": {
        "id": "b695acb2-980a-459d-88d8-e16fc47eead8",
        "name": "My Property Company",
        "phone": "14844731064",
        "email": "[email protected]",
        "website": "https://www.walmart.com",
        "ein": "040723577",
        "address": "20361 Veum Overpass, Erniechester, ID 19244",
        "zip": "99950",
        "city": "Ketchikan",
        "state": "Alaska",
        "country": "US",
        "fileId": "",
        "status": "Active"
    }
}]

I am trying to search the following string r@hot on the email field using query_string. I am expecting it to return the following data,

    [{
    "_index": "test_company",
    "_type": "_doc",
    "_id": "9303000d-b167-4d7c-b233-43a43c4a7e60",
    "_score": 2.4849067,
    "_source": {
        "id": "9303000d-b167-4d7c-b233-43a43c4a7e60",
        "name": "My Property Company",
        "phone": "11484473106",
        "email": "[email protected]",
        "website": "https://www.walmart.com",
        "ein": "731087415",
        "address": "Apt. 837 5371 Carroll Estates, Zboncakbury, LA 53957-3670",
        "zip": "99950",
        "city": "Ketchikan",
        "state": "Alaska",
        "country": "US",
        "fileId": "",
        "status": "Active"
    }
}]

but it is returning all of the data above ...

So far I have tried many combinations two of them are the following,

{
  "query": {
    "bool": {
      "must": [
        {"query_string": {
          "query": "r@hot*", 
          "fields": ["email"],
          "analyze_wildcard": true
        }}
      ]
    }
  }
}

Result: It returns all the data

{
  "query": {
    "bool": {
      "must": [
        {"query_string": {
          "query": "*r@hot*", 
          "fields": ["email"],
          "analyze_wildcard": true
        }}
      ]
    }
  }
}

Result: It returns nothing

Is there any other way to accomplish what I need?


Solution

  • Tldr;

    This is a case where you hit your head against the text analyser pipeline.

    email is a field of type text. On which the default text analyser is applied.

    To understand

    you can use the following api call:

    GET /_analyze
    {
      "analyzer" : "standard",
      "text" : "[email protected]"
    }
    

    Giving you

    {
      "tokens": [
        {
          "token": "johnie.schinner",
          "start_offset": 0,
          "end_offset": 15,
          "type": "<ALPHANUM>",
          "position": 0
        },
        {
          "token": "hotmail.com",
          "start_offset": 16,
          "end_offset": 27,
          "type": "<ALPHANUM>",
          "position": 1
        }
      ]
    }
    

    Indeed two tokens.

    But when you are going to perform your query against the email field. Your query is also going to be using the pipeline.

    You can notice it if you are to use the explain flag in your query.

    GET 75680788/_search
    {
      "query": {
        "bool": {
          "filter": [
            {
              "query_string": {
                "query": "r@hot*",
                "fields": [
                  "email"
                ],
                "analyze_wildcard": true
              }
            }
          ]
        }
      },
      "explain": true
    }
    

    You get:

    {
      "took": 1,
      "timed_out": false,
      "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
      },
      "hits": {
        "total": {
          "value": 3,
          "relation": "eq"
        },
        "max_score": 0,
        "hits": [
          {
            "_shard": "[75680788][0]",
            "_node": "CTqyybxMQGi088cQT29ewA",
            "_index": "75680788",
            "_id": "7hluxYYBSMeA9tKN7RQT",
            "_score": 0,
            "_source": {
              "id": "b695acb2-980a-459d-88d8-e16fc47eead8",
              "name": "My Property Company",
              "phone": "14844731064",
              "email": "[email protected]",
              "website": "https://www.walmart.com",
              "ein": "040723577",
              "address": "20361 Veum Overpass, Erniechester, ID 19244",
              "zip": "99950",
              "city": "Ketchikan",
              "state": "Alaska",
              "country": "US",
              "fileId": "",
              "status": "Active"
            },
            "_explanation": {
              "value": 0,
              "description": "ConstantScore(email:r email:hot*)^0.0",
              "details": []
            }
          },
          {
            "_shard": "[75680788][0]",
            "_node": "CTqyybxMQGi088cQT29ewA",
            "_index": "75680788",
            "_id": "7xluxYYBSMeA9tKN7RQT",
            "_score": 0,
            "_source": {
              "id": "9303000d-b167-4d7c-b233-43a43c4a7e60",
              "name": "MyPropertyCompany",
              "phone": "11484473106",
              "email": "[email protected]",
              "website": "https://www.walmart.com",
              "ein": "731087415",
              "address": "Apt.8375371CarrollEstates,Zboncakbury,LA53957-3670",
              "zip": "99950",
              "city": "Ketchikan",
              "state": "Alaska",
              "country": "US",
              "fileId": "",
              "status": "Active"
            },
            "_explanation": {
              "value": 0,
              "description": "ConstantScore(email:r email:hot*)^0.0",
              "details": []
            }
          },
          {
            "_shard": "[75680788][0]",
            "_node": "CTqyybxMQGi088cQT29ewA",
            "_index": "75680788",
            "_id": "8BluxYYBSMeA9tKN7RQT",
            "_score": 0,
            "_source": {
              "id": "9303000d-b167-4d7c-b233-43a43c4a7e60",
              "name": "MyPropertyCompany",
              "phone": "11484473106",
              "email": "[email protected]",
              "website": "https://www.walmart.com",
              "ein": "731087415",
              "address": "Apt.8375371CarrollEstates,Zboncakbury,LA53957-3670",
              "zip": "99950",
              "city": "Ketchikan",
              "state": "Alaska",
              "country": "US",
              "fileId": "",
              "status": "Active"
            },
            "_explanation": {
              "value": 0,
              "description": "ConstantScore(email:r email:hot*)^0.0",
              "details": []
            }
          }
        ]
      }
    }
    

    You get 2 queries, (email:r email:hot*) and by default the OR operator is applied.

    Solution

    I suggest you use a type keyword and this should work.

    GET 75680788/_search
    {
      "query": {
        "bool": {
          "filter": [
            {
              "query_string": {
                "query": "*r@hot*",
                "fields": [
                  "email.keyword"
                ],
                "analyze_wildcard": true
              }
            }
          ]
        }
      },
      "explain": true
    }