Search code examples
elasticsearchelastic-stackquerydslelasticsearch-dsl

Elasticsearch query to search from different fields non-strictly


I'm relatively new with elasticsearch and have been working with elasticsearch in python. What I had was people's data in csv format which I had converted to json and had added to elasticsearch index. The fields are the following:

'Last Name (Legal Name)', 
'First Name', 
'Middle Name', 
'Other Last Name', 
'Business Mailing Address City Name', 
'Business Mailing Address State Name',
'Business Practice Location Address City Name', 
'Business Practice Location Address State Name', 
'Authorized Official Last Name', 
'Authorized Official First Name', 
'Authorized Official Middle Name', 
'Authorized Official Title or Position'

What I want to put in the query is something like "Name City". For Example, "Clinton Adams Hamilton" where Clinton Adams is the name and Hamilton is the city. In most of the cases I won't be sure which is the first name or which is the last, so I will need to match from all the fields.

What I've used till now is something like this:

"query":{
                "query_string":{
                    "fields": ['Last Name (Legal Name)', 'First Name', 'Middle Name', 
                             'Other Last Name', 'Business Mailing Address City Name', 
                             'Business Mailing Address State Name',
                             'Business Practice Location Address City Name', 'Business Practice                            
                              Location Address State Name', 
                             'Authorized Official Last Name', 'Authorized Official First Name', 
                             'Authorized Official Middle Name', 
                             'Authorized Official Title or Position'],
                    "query": "(Clinton) AND (Adams) AND (Hamilton)",
                    }
                }

The following query works fine if I search the exact name and City from the database but if I have a spelling mistake in the name or if in some name the first name is abbreviated, it doesn't give the expected results. For example if the query is something like "Clinton A Hamilton", it wouldn't match with any documents. I can't use the ORoperator as there are multiple people with similar names, so all of the parts of the query - first name/last name and city are important. I would want the query to fetch the most relevant record from the index.

I tried my best to explain the situation. In any case, feel free to ask if anything isn't really clear. Appreciate your suggestions. Thanks.


Solution

  • The first thing I would do is to refine your document mapping. In particular, I would think about whether it makes sense to keep so many different name-related fields (e.g., first name, last name, middle name, official authorized ...), or if it would make sense to filter some and merge something else. For example, would this document mapping make sense for you?

    {
      'name', 
      'business_mailing': {
        'city',
        'state'
      },
      'business_practice_location': {
        'city',
        'state'
      }
    }
    

    Point is, you should optimise your data based on the queries you'd like to run against it.

    With the mapping above, then you could run a boolean query (a must) that contains two match queries, possibly with the fuzzyness option configured so as to account for typos. E.g.,

    {
      'query': {
        'bool': {
          'must': [{
            'match': {
              'name': { 'query': 'Clinton Adams', 'fuzzyness': 'AUTO'
            }
          },{
            'match': {
              'business_mailing': { 'query': 'Hamilton', 'fuzzyness': 'AUTO'
            }
          }]
        }
      }
    }
    

    Another solution could be to use the copy_to mapping option and define a new queryable field that includes the values of the name fields and city fields. Here is the link to the documentation https://www.elastic.co/guide/en/elasticsearch/reference/7.5/copy-to.html