Search code examples
elasticsearchelastic-stackelasticsearch-5aws-elasticsearch

exact match with IN Operator in elasticsearch


How to find exact match of multiple text values with IN operation? e.g.,

SELECT * WHERE name in ('george','michael') AND testID in (1,2)

When I try to run below query, it returns extra data for matching values e.g., "the george", not for the exact values only as "george"

{  
   "query":{  
      "bool":{  
         "filter":[  
            {  
               "query_string":{  
                  "fields":[  
                     "name"
                  ],
                  "query":"george or michael"
               }
            },
            {  
               "terms":{  
                  "testID":[1,2]
               }
            }
         ]
      }
   }
}

Solution

  • That is because your field name is of type text.

    When datatype if of text, Elasticsearch breaks the values(for e.g The George or George Washington into tokens [the, george] or [george, washington] and saves these tokens in inverted index. Hence when you search on this field using the query you have, it would also return these documents i.e.what you say as not exact match.

    For exact match I suggest you make use of keyword type using lowercase normalizer as mentioned in below mapping:

    PUT <your_index_name>
    {  
       "settings":{  
          "analysis":{  
             "normalizer":{  
                "my_custom_normalizer":{  
                   "type":"custom",
                   "filter":[  
                      "lowercase"
                   ]
                }
             }
          }
       },
       "mappings":{  
          "mydocs":{  
             "properties":{  
                "name":{  
                   "type":"keyword",
                   "normalizer":"my_custom_normalizer"
                }
             }
          }
       }
    }
    

    By doing so your values would be saved as is for e.g. based on above mapping, it would convert The George or George Washington into the george or george washtingon and then store them as is in the inverted index.

    You can probably have the query you've mentioned then working as expected.

    Hope it helps!