I encountered a weird search behaviour while exploring opensearch's search engine. All the records within my index are as follows:
[{
"_index": "table1",
"_id": "AO4AnIYBC-oD5gl3Hm7W",
"_score": 1,
"_source": {
"period": "JAN-23",
"requestID": "10273376",
"header": {
"period": "JAN-23"
},
"status": "Complete"
}
},
{
"_index": "table1",
"_id": "gavmkoYB7MbgbX172uOM",
"_score": 1,
"_source": {
"period": "JAN-23",
"requestID": "100138128",
"header": {
"period": "JAN-23"
},
"status": "Complete"
}
},
{
"_index": "table1",
"_id": "g6vnkoYB7MbgbX17POOY",
"_score": 1,
"_source": {
"period": "FEB-23",
"requestID": "10246457",
"header": {
"period": "FEB-23"
},
"status": "Complete"
}
},
{
"_index": "table1",
"_id": "hKvnkoYB7MbgbX17XeOw",
"_score": 1,
"_source": {
"period": "JAN-23",
"requestID": "10273941",
"header": {
"period": "JAN-23"
},
"status": "Complete"
}
},
{
"_index": "table1",
"_id": "_-7nkoYBC-oD5gl3TW1Z",
"_score": 1,
"_source": {
"period": "FEB-23",
"requestID": "10254951",
"header": {
"period": "FEB-23"
},
"status": "Complete"
}
},
{
"_index": "table1",
"_id": "gqvnkoYB7MbgbX17JONH",
"_score": 1,
"_source": {
"period": "JAN-23",
"requestID": "10273376",
"header": {
"period": "JAN-23"
},
"status": "Complete"
}
}
]
Here are some of the results that I am getting When querying this data
Query 1: Returns Correct Info
GET /table1/_search
{
"query": {
"match": {
"status": "Complete"
}
}
}
Query 2: Returns all records, which is wrong ideally it should only return 4 records
GET /table1/_search
{
"query": {
"match": {
"period": "JAN-23"
}
}
}
Query 3: Returns 4 records with period : JAN-23
which is again wrong as it should now return 0 records
GET /table1/_search
{
"query": {
"match": {
"period": "JAN-22"
}
}
}
Query 4: Returns all records, which is again wrong, as it should return 0
GET /table1/_search
{
"query": {
"match": {
"period": "DEC-23"
}
}
}
It would be really helpful if anyone can help me understand why is it so?
Thanks
The field period
in the index table1 is a text
due to which the value of this filed will be analysed and converted into multiple tokens, like JAN-23
will be converted to jan
and 23
, so while querying "period": "JAN-23"
, all the documents whose period field contains jan or 23 are returned. To search a text exactly, we can use term
query to search.
Mapping
"period": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } }
Term Query which returned accurate results
GET /table1/_search
{
"query": {
"term": {
"period.keyword": {
"value": "DEC-22"
}
}
}
}