Search code examples
elasticsearchsearchtokenize

Elasticsearch match on contained phrase with whitespace


I need a search where there should be a search match if a key phrase is contained, but the key phrase can have whitespace and the whole phrase must be there.

The way I understand it, the index_analyzer and searh_analyzer can both either split on spaces or not, giving four possibilities - none of which seem to do what I need.

As an example, let's say the key phrase is "one to". That means I would like a search with "one two" or "one two three" to match but not one with "one". Considering the different options:

  1. Split on both index and search -> doesn't work because "one" will match
  2. Split on index but not on search -> doesn't work because "one two" won't match
  3. Don't split on index, split on search -> doesn't work because "one two" won't match
  4. Don't split on neither index nor search -> doesn't work because "one two three" won't match

Solution

  • One possible solution might be to create new mapping for that field with type keyword, then it won't be analyzed by ElasticSearch and would be stored "as is" (actually you can run normalizer against it if you need to process/change it in some way). Then you don't need to deal with analyzers.

    Let's say you have field with name description, then mapping might looks like this:

    {
      ...
      "description": {
        "type": "text", // assuming you originally have it as text
        "fields": {
          "original": "keyword",
          "ignore_above": 512 // You can skip or change it and ES applies default value. 
        }
      }
    

    The above code means that ElasticSearch would keep two versions of message - default analyzed and new one which is not analyzed. Then you can access it in the following name: description.original and use for example wildcard search.