I know this issue has been discussed in several other posts, but my case is a little bit different since I have the following constraints:
So this is my query:
GET /test/_search
{
"query": {
"bool": {
"must": [
{"query_string": {
"query": "*bop-qa-io-135*",
"default_field": "errors.message",
"default_operator": "AND"}},
{"range": {"updated_at": {"gte": "2023-07-03T00:00:00"}}},
{"range": {"updated_at": {
"lte": "2023-07-05T00:00:00"}}}]}}, "from": 0, "size": 300}
The type of errors.message
is text
. This query doesn't give me what I want, I know that the standard analyzer is working behind the scenes here to split my hyphenated query into separated terms etc.
My question is if there's a way to make this query work under the constraints detailed above?
What I've already tried:
"analyzer": "keyword"
to the queryI think there was something about putting everything in double-quotes that should've worked but I don't know how to do it here - there's already double quotes as part of the JSON syntax.
My ES version:
{
"name": "GYWR05J",
"cluster_name": "elasticsearch",
"cluster_uuid": "vFO2BdrzR0OLfPeVO9Rr-g",
"version": {
"number": "6.2.2",
"build_hash": "10b1edd",
"build_date": "2018-02-16T19:01:30.685723Z",
"build_snapshot": false,
"lucene_version": "7.2.1",
"minimum_wire_compatibility_version": "5.6.0",
"minimum_index_compatibility_version": "5.0.0"
},
"tagline": "You Know, for Search"
}
I think there was something about putting everything in double-quotes that should've worked but I don't know how to do it here
You can escape -
with \-
and put everything in quotes by escaping them with \"
. So you will get something like this: "query": "\"*bop\\-qa\\-io\\-135*\""
but it will not help you because query_string
query doesn't work with wildcards. You can either choose wildcard or choose phrases there, but not both.
Unfortunately, if you cannot reindex, the solution is not going to be simple. First, you need to analyze your request to see which tokens are generated:
POST test/_analyze
{
"field": "errors.message",
"text": ["bop-qa-io-135"]
}
Then from the generated tokens you need to create a span_near
query with span_multi
with wildcard
query for the first and the last terms and with span_term
for all other terms. The terms should be in the format produced by the _analyze
request. So, for *bop-qa-io-135*
we will get
POST test/_search
{
"query": {
"span_near": {
"clauses": [
{
"span_multi": {
"match": {
"wildcard": {
"errors.message": "*bop"
}
}
}
},
{
"span_term": {
"errors.message": "qa"
}
},
{
"span_term": {
"errors.message": "io"
}
},
{
"span_multi": {
"match": {
"wildcard": {
"errors.message": "135*"
}
}
}
}
],
"in_order": true
}
},
"from": 0,
"size": 300
}
If reindexing is an option, you can use an analyzer that is better suited for your type of text. There are numerous options. You can use whitespace analyzer for example, or use char filter to map -
to some character that is not getting split by the analyzer for example _
. One side effect of this approach is that because the character filter is applied for both indexing and searching searching for both -
or _
will return both -
and _
:
PUT test
{
"settings": {
"number_of_replicas": 0,
"number_of_shards": 1,
"analysis": {
"analyzer": {
"nonsplit_analyzer": {
"tokenizer": "standard",
"char_filter": [
"nonsplit_char_filter"
]
}
},
"char_filter": {
"nonsplit_char_filter": {
"type": "mapping",
"mappings": [
"- => _"
]
}
}
}
},
"mappings": {
"properties": {
"text": {
"type": "text",
"analyzer":"nonsplit_analyzer"
}
}
}
}
POST test/_bulk?refresh
{"index":{}}
{"text": "bazbop-qa-io-135678"}
{"index":{}}
{"text": "foobop_qa_io_135678"}
{"index":{}}
{"text": "foobop-qa-io-234567"}
{"index":{}}
{"text": "foobop qa io 135678"}
POST test/_search
{
"query": {
"query_string": {
"default_field": "text",
"query": "*bop-qa-io-135*"
}
}
}
POST test/_search
{
"query": {
"query_string": {
"default_field": "text",
"query": "*bop_qa_io_135*"
}
}
}