Search code examples
logstashjiralogstash-grokelkatlassian-crowd

Creating a custom GROK pattern


currently, I'm trying to create a grok pattern for this log

2020-03-11 05:54:26,174 JMXINSTRUMENTS-Threading [{"timestamp":"1583906066","label":"Threading","ObjectName":"java.lang:type\u003dThreading","attributes":[{"name":"CurrentThreadUserTime","value":18600000000},{"name":"ThreadCount","value":152},{"name":"TotalStartedThreadCount","value":1138},{"name":"CurrentThreadCpuTime","value":20804323112},{"name":"PeakThreadCount","value":164},{"name":"DaemonThreadCount","value":136}]}]

At the moment I can match correctly until the JMXINTRUMENTS-Threading by using this pattern:

%{TIMESTAMP_ISO8601:timestamp} (?<instrument>[^\ ]*) ?%{GREEDYDATA:log_message}

But I can not seem to match all the values after this. Has anybody got an idea as to what pattern I should use?


Solution

  • i'm trying your pattern in https://grokdebug.herokuapp.com/ (which is the official debugger for logstash) and it does match everything after "JMXINTRUMENTS-Threading" with your pattern in a big field called log message, in this way:

    {
      "timestamp": [
        [
          "2020-03-11 05:54:26,174"
        ]
      ],
      "YEAR": [
        [
          "2020"
        ]
      ],
      "MONTHNUM": [
        [
          "03"
        ]
      ],
      "MONTHDAY": [
        [
          "11"
        ]
      ],
      "HOUR": [
        [
          "05",
          null
        ]
      ],
      "MINUTE": [
        [
          "54",
          null
        ]
      ],
      "SECOND": [
        [
          "26,174"
        ]
      ],
      "ISO8601_TIMEZONE": [
        [
          null
        ]
      ],
      "instrument": [
        [
          "JMXINSTRUMENTS-Threading"
        ]
      ],
      "log_message": [
        [
          "[{"timestamp":"1583906066","label":"Threading","ObjectName":"java.lang:type\\u003dThreading","attributes":[{"name":"CurrentThreadUserTime","value":18600000000},{"name":"ThreadCount","value":152},{"name":"TotalStartedThreadCount","value":1138},{"name":"CurrentThreadCpuTime","value":20804323112},{"name":"PeakThreadCount","value":164},{"name":"DaemonThreadCount","value":136}]}]"
        ]
      ]
    }
    

    if you wish to match all the field contained in log message you should use a json filter in your logstash pipeline filter section, just right below your grok filter:

    For example:

      grok {
         match => { "message" =>"%{TIMESTAMP_ISO8601:timestamp} (?<instrument>[^\ ]*) ?%{GREEDYDATA:log_message}" }
         tag_on_failure => ["no_match"]
      }
      if "no_match" not in [tags] {
        json {
          source => "log_message"
        }
      }
    

    In that way your json will be splitted in key: value and parsed.

    EDIT:

    You could try to use a kv filter instead of json, here the docs: https://www.elastic.co/guide/en/logstash/current/plugins-filters-kv.html

    grok {
         match => { "message" =>"%{TIMESTAMP_ISO8601:timestamp} (?<instrument>[^\ ]*) ?%{GREEDYDATA:log_message}" }
         tag_on_failure => ["no_match"]
      }
      if "no_match" not in [tags] {
        kv {
          source => "log_message"
          value_split => ":" 
          include_brackets => true #remove brackets
          remove_char_key => "\""
          remove_char_value => "\""
          field_split => ","
        }
      }