Search code examples
opensearchamazon-opensearch

OpenSearch Weird Search Behaviour


I encountered a weird search behaviour while exploring opensearch's search engine. All the records within my index are as follows:

[{
        "_index": "table1",
        "_id": "AO4AnIYBC-oD5gl3Hm7W",
        "_score": 1,
        "_source": {
            "period": "JAN-23",
            "requestID": "10273376",
            "header": {
                "period": "JAN-23"
            },
            "status": "Complete"
        }
    },
    {
        "_index": "table1",
        "_id": "gavmkoYB7MbgbX172uOM",
        "_score": 1,
        "_source": {
            "period": "JAN-23",
            "requestID": "100138128",
            "header": {
                "period": "JAN-23"
            },
            "status": "Complete"
        }
    },
    {
        "_index": "table1",
        "_id": "g6vnkoYB7MbgbX17POOY",
        "_score": 1,
        "_source": {
            "period": "FEB-23",
            "requestID": "10246457",
            "header": {
                "period": "FEB-23"
            },
            "status": "Complete"
        }
    },
    {
        "_index": "table1",
        "_id": "hKvnkoYB7MbgbX17XeOw",
        "_score": 1,
        "_source": {
            "period": "JAN-23",
            "requestID": "10273941",
            "header": {
                "period": "JAN-23"
            },
            "status": "Complete"
        }
    },
    {
        "_index": "table1",
        "_id": "_-7nkoYBC-oD5gl3TW1Z",
        "_score": 1,
        "_source": {
            "period": "FEB-23",
            "requestID": "10254951",
            "header": {
                "period": "FEB-23"
            },
            "status": "Complete"
        }
    },
    {
        "_index": "table1",
        "_id": "gqvnkoYB7MbgbX17JONH",
        "_score": 1,
        "_source": {
            "period": "JAN-23",
            "requestID": "10273376",
            "header": {
                "period": "JAN-23"
            },
            "status": "Complete"
        }
    }
]


Here are some of the results that I am getting When querying this data

Query 1: Returns Correct Info

GET /table1/_search
{
  "query": {
    "match": {
      "status": "Complete"
    }
  }
}

Query 2: Returns all records, which is wrong ideally it should only return 4 records

GET /table1/_search
{
  "query": {
    "match": {
      "period": "JAN-23"
    }
  }
}

Query 3: Returns 4 records with period : JAN-23 which is again wrong as it should now return 0 records

GET /table1/_search
{
  "query": {
    "match": {
      "period": "JAN-22"
    }
  }
}

Query 4: Returns all records, which is again wrong, as it should return 0

GET /table1/_search
{
  "query": {
    "match": {
      "period": "DEC-23"
    }
  }
}

It would be really helpful if anyone can help me understand why is it so?

Thanks


Solution

  • The field period in the index table1 is a text due to which the value of this filed will be analysed and converted into multiple tokens, like JAN-23 will be converted to jan and 23, so while querying "period": "JAN-23", all the documents whose period field contains jan or 23 are returned. To search a text exactly, we can use term query to search.

    Mapping

    "period": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } }
    

    Term Query which returned accurate results

    GET /table1/_search
    {
       "query": {
        "term": {
          "period.keyword": {
            "value": "DEC-22"
          }
        }
       }
    }