Search code examples
elasticsearchsearchfull-text-searchquerydsl

Modify Input Dynamically to Search against data stored in Elastic Search


I am new to Elastic Search, have been reading a lot about it, but I stumbled at one requirement.

Consider a field of type text in all the documents in a index be "app_data"

Now app_data field always stores one word but that word can be an alphanumeric , numeric, alphabetic.

Requirement -

One type of word stored in app_data looks something like -

app_data:"99IPAB999999FG"

Now if the user wants to search for this app_data they enter something like

"99.IPAB.99999.9"

Another Example - Data in index

app_data:"78IGDB900459JI" User searches like - "78.IGDB.90045.9"

How should I form a ES query to match the data stored in the index docs if this is feasible?

Considerations -

  1. I cannot edit the data (using a custom analyser) during insertion to the index as app_data can have simple words like "RED", "RED567".
  2. Only for the problem mentioned above, I think I have to use a custom analyser along with query DSL.

Solution

  • Assuming that your data is already indexed and cannot be changed, my suggestion is that you apply a pattern before sending the term to the query, removing the ".", 99.IPAB.99999.9 -> 99ipab999999.

    With this, you can successfully apply the match_phrase_prefix.

    If you cannot apply the pattern to the input, you can do so at search-time "search_analyzer".

    The proposal will be to create a parser that generates the token without the ".". In your query add "analyzer":"my_analyzer" that the token will be generated without the "." and the match will work.

    New analyzer:

    PUT my-index-000001
    {
      "settings": {
        "analysis": {
          "char_filter": {
            "my_char_filter": {
              "type": "pattern_replace",
              "pattern": """\.""",
              "replacement": ""
            }
          },
          "analyzer": {
            "my_analyzer": {
              "tokenizer": "standard",
              "filter": [
                "lowercase"
              ],
              "char_filter": [
                "my_char_filter"
              ]
            }
          }
        }
      },
      "mappings": {
        "properties": {
          "app_data": {
            "type": "text",
            "analyzer": "standard"
          }
        }
      }
    }
    

    Query

    POST my-index-000001/_bulk
    {"index":{}}
    {"app_data":"99IPAB999999FG"}
    {"index":{}}
    {"app_data":"78IGDB900459JI"}
    
    POST my-index-000001/_search
    {
      "from": 0,
      "size": 5,
      "query": {
       "match_phrase_prefix": {
         "app_data": {
           "query": "78.IGDB.90045.9",
           "analyzer": "my_analyzer"
         }
       }
      }
    }
    

    Hits

    "hits": [
      {
        "_index": "my-index-000001",
        "_id": "_jNxfoUBQB-6H-4Z6KWM",
        "_score": 0.6931471,
        "_source": {
          "app_data": "78IGDB900459JI"
        }
      }
    ]