I'm new to grok in logstash and I have to parse the following log pattern.
Jul 26 09:46:37 abc-lb1 2016-07-26 09:46:37.245 +0200 abc-lb1 WF WARN UNRECOGNIZED_COOKIE 188.200.126.234 50011 10.50.51.25 443 global GLOBAL LOG NONE [Cookie\="_ga" Service-created\="769 days back" Reason\="No valid encrypted pair"] GET example.com/search.action?searchText\=EH-5H&token\=--0----EH-5H-- TLSv1.2 "-" "Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36" 188.200.126.234 50011 "-" https://example.com/my-account/login
I need to know How to avoid a set of strings in GROK
In the above logs, repeated time-stamps could be seen, I need to know, how to avoid the strings like:
Jul 26 09:46:37 abc-lb1
Suppose you need only two fields that is 2016-07-26 09:46:37.245
and https://example.com/my-account/login
then your grok filter should be as follows:
grok{ match => {"message" => "%{TIMESTAMP_ISO8601:time} %{GREEDYDATA} %{URI:url}"} }
You will get the following output:
{
"time": [
[
"2016-07-26 09:46:37.245"
]
],
"url": [
[
"https://example.com/my-account/login"
]
]
}
Here you are avoiding the first few fields in your log line by directly starting off with 2016-07-26 09:46:37.245
and you are avoiding everything in between by not naming %{GREEDYDATA}
. If you name %{GREEDYDATA}
as %{GREEDYDATA:data}
then you will the output as follows:
{
"time": [
[
"2016-07-26 09:46:37.245"
]
],
"data": [
[
"+0200 abc-lb1 WF WARN UNRECOGNIZED_COOKIE 188.200.126.234 50011 10.50.51.25 443 global GLOBAL LOG NONE [Cookie\\="_ga" Service-created\\="769 days back" Reason\\="No valid encrypted pair"] GET example.com/search.action?searchText\\=EH-5H&token\\=--0----EH-5H-- TLSv1.2 "-" "Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36" 188.200.126.234 50011 "-""
]
],
"url": [
[
"https://example.com/my-account/login"
]
]
}
Now you can apply the same steps to whichever fields you want to avoid.
you can debug the results here