Search code examples
elasticsearchlogstashkibanaelasticsearch-5elasticsearch-dsl

How to match exact document data in elasticsearch using DSL query?


My tokenizer

 "tokenizer": {
        "my_tokenizer": {
          "type": "edge_ngram",
          "min_gram": 1,
          "max_gram": 10,
          "token_chars": [
            "letter",
            "digit"
          ]
        }

I am trying to search the value based on this fields but the prob here is whenever, I want to search on the basis of token like suppose If I search with s token then I should get items matching or starting to s , now If i search with sp I want to get item starting from sp discarding other things , I just want to get the value starting with sp and discard all , I am not getting is my query wrong or filter I have used thats wrong can someone pls help me with this

 {
     "query": {
      "bool": {
       "must": [
        {
         "multi_match": {
          "query": "PRODUCT",
          "fields": [
           "item",
           "data1"
          ]
         }
        },
        {
         "multi_match": {
          "query": "SUB_FAMILY",
          "fields": [
           "item",
           "data1"
          ]
         }
        },
        {
         "match": {
          "values": "SP"
         }
        }
       ]
      }
     }
    }

The output for this query is

 "hits": [
                {
                    "_index": "logs_datas",
                    "_type": "_doc",
                    "_id": "H1PfEnkBQXpKNrJSp8bV",
                    "_score": 9.418445,
                    "_source": {
                        "message": "PRODUCT,SUB_FAMILY,SPRINHO2H",
                        "path": "/home/elasticsearchDatas.csv",
                        "hierarchy_name": "PRODUCT",
                        "@version": "1",
                        "@timestamp": "2021-04-27T10:28:37.578Z",
                        "host": "ewiglp71",
                        "item_pk": "SPRINHO2H",
                        "attribute_name": "SUB_FAMILY"
                    }
                },
                {
                    "_index": "logs_datas",
                    "_type": "_doc",
                    "_id": "y1PfEnkBQXpKNrJSp8XQ",
                    "_score": 5.3059187,
                    "_source": {
                        "message": "PRODUCT,SUB_FAMILY,SCMLPLWVI",
                        "path": "/home/niteshb/elasticsearchDatas.csv",
                        "hierarchy_name": "PRODUCT",
                        "@version": "1",
                        "@timestamp": "2021-04-27T10:28:37.577Z",
                        "host": "ewiglp71",
                        "item_pk": "SCMLPLWVI",
                        "attribute_name": "SUB_FAMILY"
                    }
                },
                {
                    "_index": "logs_datas",
                    "_type": "_doc",
                    "_id": "zFPfEnkBQXpKNrJSp8XQ",
                    "_score": 5.3059187,
                    "_source": {
                        "message": "PRODUCT,SUB_FAMILY,SSVRKEN2Z",
                        "path": "/home/elasticsearchDatas.csv",
                        "hierarchy_name": "PRODUCT",
                        "@version": "1",
                        "@timestamp": "2021-04-27T10:28:37.579Z",
                        "host": "ewiglp71",
                        "item_pk": "SSVRKEN2Z",
                        "attribute_name": "SUB_FAMILY"
                    }
                }
                }
            ]
        }
    }

Solution

  • Since the min_gram is 1, so the tokens generated for SCMLPLWVI will be

    {
      "tokens": [
        {
          "token": "S",
          "start_offset": 0,
          "end_offset": 1,
          "type": "word",
          "position": 0
        },
        {
          "token": "SC",
          "start_offset": 0,
          "end_offset": 2,
          "type": "word",
          "position": 1
        },
        {
          "token": "SCM",
          "start_offset": 0,
          "end_offset": 3,
          "type": "word",
          "position": 2
        },
        {
          "token": "SCML",
          "start_offset": 0,
          "end_offset": 4,
          "type": "word",
          "position": 3
        },
        {
          "token": "SCMLP",
          "start_offset": 0,
          "end_offset": 5,
          "type": "word",
          "position": 4
        },
        {
          "token": "SCMLPL",
          "start_offset": 0,
          "end_offset": 6,
          "type": "word",
          "position": 5
        },
        {
          "token": "SCMLPLW",
          "start_offset": 0,
          "end_offset": 7,
          "type": "word",
          "position": 6
        },
        {
          "token": "SCMLPLWV",
          "start_offset": 0,
          "end_offset": 8,
          "type": "word",
          "position": 7
        },
        {
          "token": "SCMLPLWVI",
          "start_offset": 0,
          "end_offset": 9,
          "type": "word",
          "position": 8
        }
      ]
    }
    

    If you want to get the value starting with sp then you need to modify your tokenizer as

     "tokenizer": {
            "my_tokenizer": {
              "type": "edge_ngram",
              "min_gram": 2,          // note this
              "max_gram": 10,
              "token_chars": [
                "letter",
                "digit"
              ]
            }
    

    Update 1:

    You can use a match_bool_prefix to search for words starting with s or sp

    Adding a working example

    Index Mapping:

    {
      "mappings": {
        "properties": {
          "item_pk": {
            "type": "text"
          }
        }
      }
    }
    

    Search Query 1:

    {
      "query": {
        "match_bool_prefix" : {
          "item_pk" : "s"
        }
      }
    }
    

    Search Result will be

    "hits": [
          {
            "_index": "67281810",
            "_type": "_doc",
            "_id": "1",
            "_score": 1.0,
            "_source": {
              "message": "PRODUCT,SUB_FAMILY,SPRINHO2H",
              "path": "/home/niteshb/elasticsearchDatas.csv",
              "hierarchy_name": "PRODUCT",
              "@version": "1",
              "@timestamp": "2021-04-27T10:28:37.578Z",
              "host": "ewiglp71",
              "item_pk": "SPRINHO2H",
              "attribute_name": "SUB_FAMILY"
            }
          },
          {
            "_index": "67281810",
            "_type": "_doc",
            "_id": "i7quE3kB6jKCA-nFYii6",
            "_score": 1.0,
            "_source": {
              "message": "PRODUCT,SUB_FAMILY,SCMLPLWVI",
              "path": "/home/niteshb/elasticsearchDatas.csv",
              "hierarchy_name": "PRODUCT",
              "@version": "1",
              "@timestamp": "2021-04-27T10:28:37.577Z",
              "host": "ewiglp71",
              "item_pk": "SCMLPLWVI",
              "attribute_name": "SUB_FAMILY"
            }
          },
          {
            "_index": "67281810",
            "_type": "_doc",
            "_id": "jLquE3kB6jKCA-nFgiju",
            "_score": 1.0,
            "_source": {
              "message": "PRODUCT,SUB_FAMILY,SSVRKEN2Z",
              "path": "/home/niteshb/elasticsearchDatas.csv",
              "hierarchy_name": "PRODUCT",
              "@version": "1",
              "@timestamp": "2021-04-27T10:28:37.579Z",
              "host": "ewiglp71",
              "item_pk": "SSVRKEN2Z",
              "attribute_name": "SUB_FAMILY"
            }
          }
        ]
    

    Search Query 2:

    {
      "query": {
        "match_bool_prefix" : {
          "item_pk" : "sp"
        }
      }
    }
    

    Search Result:

    "hits": [
          {
            "_index": "67281810",
            "_type": "_doc",
            "_id": "1",
            "_score": 1.0,
            "_source": {
              "message": "PRODUCT,SUB_FAMILY,SPRINHO2H",
              "path": "/home/niteshb/elasticsearchDatas.csv",
              "hierarchy_name": "PRODUCT",
              "@version": "1",
              "@timestamp": "2021-04-27T10:28:37.578Z",
              "host": "ewiglp71",
              "item_pk": "SPRINHO2H",
              "attribute_name": "SUB_FAMILY"
            }
          }
        ]
    

    Update 2:

    Try with this query

    {
      "query": {
        "bool": {
          "must": [
            {
              "match": {
                "hierarchy_name": "PRODUCT"
              }
            },
            {
              "match": {
                "attribute_name": "SUB_FAMILY"
              }
            },
            {
              "match_bool_prefix": {
                "item_pk": "sp"
              }
            }
          ]
        }
      }
    }