Search code examples
elasticsearchkibanaquerydsl

Query in Kibana doesn't return logs with Regexp


I have a field named log.file.path in Elasticsearch and it has /var/log/dev-collateral/uaa.2020-09-26.log value, I tried to retrieve all logs that log.file.path field starts with /var/log/dev-collateral/uaa I used the below regexp but it doesn't work.

{
    "regexp":{
        "log.file.path": "/var/log/dev-collateral/uaa.*"
    }
}

Solution

  • Let's see why it is not working? I've indexed two documents using Kibana UI like below -

    PUT myindex/_doc/1
    {
      "log.file.path" : "/var/log/dev-collateral/uaa.2020-09-26.log"
    }
    
    PUT myindex/_doc/2
    {
      "log.file.path" : "/var/log/dev-collateral/uaa.2020-09-26.txt"
    }
    

    When I try to see the tokens for of the text on log.file.path field using _analyze API

    POST _analyze
    {
      "text": "/var/log/dev-collateral/uaa.2020-09-26.log"
    }
    

    It gives me,

    {
      "tokens" : [
        {
          "token" : "var",
          "start_offset" : 1,
          "end_offset" : 4,
          "type" : "<ALPHANUM>",
          "position" : 0
        },
        {
          "token" : "log",
          "start_offset" : 5,
          "end_offset" : 8,
          "type" : "<ALPHANUM>",
          "position" : 1
        },
        {
          "token" : "dev",
          "start_offset" : 9,
          "end_offset" : 12,
          "type" : "<ALPHANUM>",
          "position" : 2
        },
        {
          "token" : "collateral",
          "start_offset" : 13,
          "end_offset" : 23,
          "type" : "<ALPHANUM>",
          "position" : 3
        },
        {
          "token" : "uaa",
          "start_offset" : 24,
          "end_offset" : 27,
          "type" : "<ALPHANUM>",
          "position" : 4
        },
        {
          "token" : "2020",
          "start_offset" : 28,
          "end_offset" : 32,
          "type" : "<NUM>",
          "position" : 5
        },
        {
          "token" : "09",
          "start_offset" : 33,
          "end_offset" : 35,
          "type" : "<NUM>",
          "position" : 6
        },
        {
          "token" : "26",
          "start_offset" : 36,
          "end_offset" : 38,
          "type" : "<NUM>",
          "position" : 7
        },
        {
          "token" : "log",
          "start_offset" : 39,
          "end_offset" : 42,
          "type" : "<ALPHANUM>",
          "position" : 8
        }
      ]
    }
    

    You can see, Elasticsearch has split your input text into tokens when you insert them on your index. This is because elasticsearch uses standard analyzer when we index documents and it splits our document to small parts as a token, remove punctuations, lowercased text etc. That's whey your current regexp query doesn't work.

    GET myindex/_search
    {
      "query": {
        "match": {
          "log.file.path": "var"
        }
      }
    }
    

    If you try this way it will work but for your case, you need to match every log.file.path that ends with .log So what do now? Just don't apply analyzers while indexing documents. The keyword type stores the string you provide as it is.

    Create mapping with keyword type,

    PUT myindex2/
    {
      "mappings": {
        "properties": {
          "log.file.path": {
            "type": "keyword"
          }
        }
      }
    }
    

    Index documents,

    PUT myindex2/_doc/1
    {
      "log.file.path" : "/var/log/dev-collateral/uaa.2020-09-26.log"
    }
    
    PUT myindex2/_doc/2
    {
      "log.file.path" : "/var/log/dev-collateral/uaa.2020-09-26.txt"
    }
    

    Search with regexp,

    GET myindex2/_search
    {
      "query": {
        "regexp": {
          "log.file.path": "/var/log/dev-collateral/uaa.2020-09-26.*"
        }
      }
    }