Search code examples
logstash-grokelk

Writing a grok pattern for key value pairs


 "processors" : [
      {
        "grok": {
          "field": "log",
          "patterns": ["%{TIME_STAMP:ts} %{GREEDYDATA:logtail}"],
          "pattern_definitions" : {
             "TIME_STAMP" : "%{YEAR}-%{MONTHNUM}-%{MONTHDAY} %{TIME}"
          },
          "ignore_failure" : true,
          "ignore_missing" : true
        }
      },
      {
        "kv" : {
          "field": "logtail",
          "field_split": "\\s(?![^=]+?(\\s|$))",
          "value_split": "=",
          "ignore_failure" : true
        }
      },
      {
        "remove" : {
          "field": "logtail",
          "ignore_failure" : true
        }
      },
      {
        "date" : {
          "field" : "ts",
          "formats" : ["yyyy-MM-dd HH:mm:ss,SSS"],
          "ignore_failure" : true
        }
      }
  ]

Above is our grok pipeline.

Normally our logs are nice and clean

For example:

"2024-09-24 15:07:59,572 level=INFO channel=wsgi.request method=GET path=/health/ user_agent="ELB-HealthChecker/2.0" request_action=finish duration=0.005 status=200 content_length=26"

works perfectly but if we have another = in the log all hell breaks loose!

2024-09-24 15:07:59,572 level=INFO channel=wsgi.request method=GET path=/job?id=12345 user_agent="ELB-HealthChecker/2.0" request_action=finish duration=0.005 status=200 content_length=26"

This seems like it must be a very common use case, is there an off the shelf fix for it?


Solution

  • Maybe the field split from the fortinet is a better approach. At least it works for your examples.

    "field_split": " (?=[a-z\\_\\-]+=)"
    

    see: https://github.com/elastic/beats/blob/master/x-pack/filebeat/module/fortinet/firewall/ingest/pipeline.yml#L6-L17