I am using a matchQuery
to query Elasticsearch in Java. Below is my query:
sourceBuilder.query(QueryBuilders.matchQuery("TransactionId_s","BulkRunTest.20Nov20201446.00"));
The field TransactionId_s
is not a keyword
. And I am expecting the matchQuery to match the exact string I have given and return the results. There should be no documents in Elasticsearch with TransactionId_s
as BulkRunTest.20Nov20201446.00
. But I am getting some results and they have the TransactionId_s
like below:
"TransactionId_s" : "BulkRunTest.17Sep20201222.00"
"TransactionId_s" : "BulkRunTest.22Sep20201450.00"
"TransactionId_s" : "BulkRunTest.20Sep20201250.00"
When I tried using a termQuery
instead of matchQuery
, I am getting 0 results, which is the expected result. I thought matchQuery
would allow me to query any field for the given value without me having to worry about tokenization. Am i wrong? And how do I resolve the issue I am seeing?
Any help would be much appreciated. Thank you.
Match
queries are analyzed ie it applied the same analyzer which is used on the field at index time, you can analyzer API and see the tokens for indexed and search term.
Considering you have a text
field with default analyzer(Standard) it will generate the below token for search term BulkRunTest.20Nov20201446.00
POST /_analyze
{
"analyzer" : "standard",
"text" : "BulkRunTest.20nov20201446.00"
}
And generated tokens
{
"tokens": [
{
"token": "bulkruntest", // notice this token
"start_offset": 0,
"end_offset": 11,
"type": "<ALPHANUM>",
"position": 0
},
{
"token": "20nov20201446.00",
"start_offset": 12,
"end_offset": 28,
"type": "<ALPHANUM>",
"position": 1
}
]
}
Now lets see the tokens for one of the matches doc BulkRunTest.17Sep20201222.00
POST /_analyze
{
"analyzer" : "standard",
"text" : "BulkRunTest.17Sep20201222.00"
}
And generated tokens
{
"tokens": [
{
"token": "bulkruntest", // notice same token
"start_offset": 0,
"end_offset": 11,
"type": "<ALPHANUM>",
"position": 0
},
{
"token": "17sep20201222.00",
"start_offset": 12,
"end_offset": 28,
"type": "<ALPHANUM>",
"position": 1
}
]
}
As you can see bulkruntest
is the same token in both indexed and search term, hence the match query returned the search result and same is with another indexed doc.
If you used the default auto-generated mapping and have .keyword
subfield then you can use the .keyword
field for the exact search.
Working example
{
"query": {
"term": { // term query
"TransactionId_s.keyword": { // .keyword subfield is used
"value": "BulkRunTest.20Nov20201446.00"
}
}
}
}
And search result
"hits": [
{
"_index": "test_in",
"_type": "_doc",
"_id": "2",
"_score": 0.6931471,
"_source": {
"TransactionId_s": "BulkRunTest.20Nov20201446.00"
}
}
]