Search code examples
logstashlogstash-grokelastic-stackgrok

How do we escape a set of strings or characters in GROK


I'm new to grok in logstash and I have to parse the following log pattern.

Jul 26 09:46:37 abc-lb1 2016-07-26 09:46:37.245 +0200  abc-lb1 WF WARN UNRECOGNIZED_COOKIE 188.200.126.234 50011 10.50.51.25 443 global GLOBAL LOG NONE [Cookie\="_ga" Service-created\="769 days back" Reason\="No valid encrypted pair"] GET example.com/search.action?searchText\=EH-5H&token\=--0----EH-5H-- TLSv1.2 "-" "Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36" 188.200.126.234 50011 "-" https://example.com/my-account/login 

I need to know How to avoid a set of strings in GROK

In the above logs, repeated time-stamps could be seen, I need to know, how to avoid the strings like:

Jul 26 09:46:37 abc-lb1


Solution

  • Suppose you need only two fields that is 2016-07-26 09:46:37.245 and https://example.com/my-account/login then your grok filter should be as follows:

    grok{ match => {"message" => "%{TIMESTAMP_ISO8601:time} %{GREEDYDATA} %{URI:url}"} }
    

    You will get the following output:

    {
      "time": [
        [
          "2016-07-26 09:46:37.245"
        ]
      ],
      "url": [
        [
          "https://example.com/my-account/login"
        ]
      ]
    }
    

    Here you are avoiding the first few fields in your log line by directly starting off with 2016-07-26 09:46:37.245 and you are avoiding everything in between by not naming %{GREEDYDATA}. If you name %{GREEDYDATA} as %{GREEDYDATA:data} then you will the output as follows:

    {
      "time": [
        [
          "2016-07-26 09:46:37.245"
        ]
      ],
      "data": [
        [
          "+0200  abc-lb1 WF WARN UNRECOGNIZED_COOKIE 188.200.126.234 50011 10.50.51.25 443 global GLOBAL LOG NONE [Cookie\\="_ga" Service-created\\="769 days back" Reason\\="No valid encrypted pair"] GET example.com/search.action?searchText\\=EH-5H&token\\=--0----EH-5H-- TLSv1.2 "-" "Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36" 188.200.126.234 50011 "-""
        ]
      ],
      "url": [
        [
          "https://example.com/my-account/login"
        ]
      ]
    }
    

    Now you can apply the same steps to whichever fields you want to avoid.

    you can debug the results here