Search code examples
elasticsearch

Wildcard/regexp query with asterix chacter


I'm new to elasticsearch. I would write a query, that matches a document with the following attribute:

description: 20*40x555

I want the query matches with the following inputs:

  • 20X40X555
  • 20*40*555
  • 20x40x555
  • the combination of these

I tired the wildcard, and the regex query also. If the document's attribute does not contains any asterix characters the query works fine, but when the description attribute contains an * character it does not find.

I tried with the following queries:

with wildcard:

{
  "query": {
    "bool": {
      "must": [
        {
          "wildcard": {
            "short_description": "*20?40?555*"
          }
        }
      ]
    }
  }
}

with regexp:

{
  "query": {
    "bool": {
      "must": [
        {
          "regexp": {
            "short_description": ".*20.*40.*555.*"
          }
        }
      ]
    }
  }
}

When the short attribute is ex 20x40x555 both of these working, but if I change the value to 20*40x555 it does not returns the document unfortunately.

How can I achieve to get results when the value of the sort_description is eg. 20*40x555 ? Thanks!


Solution

  • Assuming that you are using standard analyzer to index the "*" is no longer present in the index. The text 20*40x555 is indexed as two tokens 20 and 40x555. The simplest way to make * interchangeable with x is to replace it with x during the indexing operations by using patter_replace filter. Here is a simple example that illustrates this idea:

    DELETE test
    
    PUT test
    {
      "settings": {
        "analysis": {
          "char_filter": {
            "asterisk_to_x": {
              "type": "pattern_replace",
              "pattern": "(\\S)\\*(\\S)",
              "replacement": "$1x$2"
            }
          },
          "analyzer": {
            "custom_analyzer": {
              "tokenizer": "keyword",
              "char_filter": ["asterisk_to_x"],
              "filter": ["lowercase"]
            }
          }
        }
      },
      "mappings": {
        "properties": {
          "short_description": {
            "type": "text", 
            "analyzer": "custom_analyzer"
          }
        }
      }
    }
    
    POST test/_bulk?refresh
    { "index": { "_id": "1" } }
    {"short_description":"20x40*555"}
    
    
    POST test/_search
    {
      "query": {
        "bool": {
          "must": [
            {
              "match": {
                "short_description": "20X40X555"
              }
            }
          ]
        }
      }
    }