Search code examples
elasticsearchspacecase-insensitive

Elasticsearch case insesitive wildcard search with spaced words


field priorityName is of search_as_you_type dataType.

My use case is like I want to search the document with the following words:---

  1. "let's" -> should give both the results
  2. "DOING" -> should give both the results
  3. "are you" -> should give both the results
  4. "Are You" -> should give both the results
  5. "you do" (short of you doing)-> should give both the results
  6. "re you" -> should give both the results

Out of 6, only the first 5 are giving me the desired result using multi_match. how can I have the 6th use case where we can have incomplete word not starting with the first characters.

Sampple docs

        "_index": "priority",
        "_type": "_doc",
        "_id": "vaCI_HAB31AaC-t5TO9H",
        "_score": 1,
        "_source": { - 
          "priorityName": "What are you doing along Let's Go out"
        }
      },
      { - 
        "_index": "priority",
        "_type": "_doc",
        "_id": "vqCQ_HAB31AaC-t5wO8m",
        "_score": 1,
        "_source": { - 
          "priorityName": "what are you doing along let's go for shopping"
        }
      }
    ]
  }

Solution

  • For last search re you, you need infix tokens and by default its not included in the search_as_you_type datatype. I would suggest you to create a custom analyzer which will create infix tokens and allow you to match all your 6 queries.

    I have already created a custom analyzer and test it with your sample documents and all 6 queries are giving both the sample results.

    Index mapping

    POST /infix-index

    {
        "settings": {
            "max_ngram_diff": 50,
            "analysis": {
                "filter": {
                    "autocomplete_filter": {
                        "type": "ngram",
                        "min_gram": 1,
                        "max_gram": 8
                    }
                },
                "analyzer": {
                    "autocomplete_analyzer": {
                        "type": "custom",
                        "tokenizer": "whitespace",
                        "filter": [
                            "lowercase",
                            "autocomplete_filter"
                        ]
                    },
                    "lowercase_analyzer": {
                        "type": "custom",
                        "tokenizer": "whitespace",
                        "filter": [
                            "lowercase"
                        ]
                    }
                }
            }
        },
        "mappings": {
            "properties": {
                "priorityName": {
                    "type": "text",
                    "analyzer": "autocomplete_analyzer",
                    "search_analyzer": "standard" --> note this
                }
            }
        }
    }
    

    Index your sample docs

    {
      "priorityName" : "What are you doing along Let's Go out"
    }
    
    {
      "priorityName" : "what are you doing along let's go for shopping"
    }
    

    Search query for last re you

    {
        "query": {
            "match" : {
                "priorityName" : "re you"
            }
        }
    }
    

    And result

    "hits": [
          {
            "_index": "ngram",
            "_type": "_doc",
            "_id": "1",
            "_score": 1.4652853,
            "_source": {
              "priorityName": "What are you doing along Let's Go out"
            }
          },
          {
            "_index": "ngram",
            "_type": "_doc",
            "_id": "2",
            "_score": 1.4509768,
            "_source": {
              "priorityName": "what are you doing along let's go for shopping"
            }
          }
    

    Other queries also returned me both the documents but not including them to shorten the length of this answer.

    Note: Below are some important links to understand the answer in detail.

    https://www.elastic.co/guide/en/elasticsearch/reference/current/search-analyzer.html

    https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-ngram-tokenizer.html