Search code examples
pythonelasticsearchsubstringelasticsearch-dslelasticsearch-query

A particular case of elasticsearch substring query


I want to search for a substring in a document column using elasticsearch. The document column contains strings exactly 255 characters long. In that column I want to search occurrences of a substring within a specified position. For e.g. I want to search the substring "ABC" that lies at character position 5-7 of the string. Thus xxxxABCxxxxx... is a valid answer but xxABCxxxxx... is NOT (consider that index starts from 1).

The wildcard query can search substrings but not in a specified fixed position.

{
    "query": {
        "wildcard": {
           "String Name": {
              "value": "*ABC*"
           }
        }
    }
}

How do I formulate this query in python?


Solution

  • Could you use the regexp filter to achieve this?

    from elasticsearch import Elasticsearch
    
    es = Elasticsearch(...)
    resp = es.search(
        index="index-name",
        body={
            "query": {
                "regexp": {
                    "String Name": {
                        "value": "^.{4}ABC"
                    }
                }
            }
        }
    )
    print(resp)
    

    You'll have to enable search.allow_expensive_queries to enable usage of the regexp filter.

    <disclosure: I'm maintainer of the Python Elasticsearch clients and employed by Elastic>