I want to search for a substring in a document column using elasticsearch. The document column contains strings exactly 255 characters long. In that column I want to search occurrences of a substring within a specified position. For e.g. I want to search the substring "ABC" that lies at character position 5-7 of the string. Thus xxxxABCxxxxx... is a valid answer but xxABCxxxxx... is NOT (consider that index starts from 1).
The wildcard query can search substrings but not in a specified fixed position.
{
"query": {
"wildcard": {
"String Name": {
"value": "*ABC*"
}
}
}
}
How do I formulate this query in python?
Could you use the regexp
filter to achieve this?
from elasticsearch import Elasticsearch
es = Elasticsearch(...)
resp = es.search(
index="index-name",
body={
"query": {
"regexp": {
"String Name": {
"value": "^.{4}ABC"
}
}
}
}
)
print(resp)
You'll have to enable search.allow_expensive_queries
to enable usage of the regexp
filter.
<disclosure: I'm maintainer of the Python Elasticsearch clients and employed by Elastic>