Search code examples
regexlogstash-grokgrok

Match multiple fields and variable fields in grok


I'm working on a project where I have to scan some log file of error from an apache server.

I'm building a grok pattern to scan these error files.

At the moment, this is my pattern:

\[(?<timestamp>%{DAY:day} %{MONTH:month} %{MONTHDAY} %{TIME} %{YEAR})\]\s\[:%{LOGLEVEL:loglevel}\]\s\[pid %{NUMBER:pid}]\s\[client %{IP:clientip}:.*]\s\[client %{IP:clientip2}.*\]\sModSecurity:\s%{WORD:modSecurity}.\s(%{GREEDYDATA:error}\.\s)\[file\s%{QS:path_file}\]\s\[line %{QS:line}]\s\[id %{QS:id}\]\s\[msg %{QS:message}\]\s{0,1}(\[data %{QS:data}\])\s{0,1}(\[severity %{QS:severity}\])\s{0,1}(\[ver %{QS:ver}\])\s{1,5}(\[tag %{QS:tag})\].*\[hostname %{QS:hostname}\]\s\[uri %{QS:uri}\]\s\[unique_id %{QS:unique_id}\]\,\sreferer:\s%{URI:referer}

These are two examples of the log files:

[Wed Aug 25 12:55:58.601261 2021] [:error] [pid 20282] [client 83.216.165.253:59075] [client 83.216.165.253] ModSecurity: Warning. Match of "rx (?:\\\\x1f\\\\x8b\\\\x08|\\\\b(?:(?:i(?:nterplay|hdr|d3)|m(?:ovi|thd)|r(?:ar!|iff)|(?:ex|jf)if|f(?:lv|ws)|varg|cws)\\\\b|gif)|B(?:%pdf|\\\\.ra)\\\\b|^wOF(?:F|2))" against "RESPONSE_BODY" required. [file "/usr/share/modsecurity-crs/rules/RESPONSE-953-DATA-LEAKAGES-PHP.conf"] [line "105"] [id "953120"] [msg "PHP source code leakage"] [data "Matched Data: <? found within RESPONSE_BODY: \\x03\\xa4<\\x11U\\xb5\\x1f\\x0e\\xa0\\x91\\xb2p\\xfe~\\xff\\x9b\\xa9\\xd5\\xdd\\x97\\x13\\x0ay\\xb1\\xa5j\\x92\\xc2B\\xee\\xa1S\\xdb\\x9a^R=[\\x9c\\xd1\\xfb\\x04>$\\xc4 \\xc1\\x22@Y\\x8aJ\\xb7\\xfb\\x5c\\xee\\xe3\\x93\\xd9\\xf4\\xef\\x5cN\\xaf\\xea\\x22\\xf3#\\x0c\\x06\\x82\\x1b\\xf3\\xd5-+Y6\\x92\\xb4\\x93e\\x18\\xd9\\x92\\x8d\\x1ac\\xb9\\x92\\x800\\xc1\\xff\\xff^-\\xd3`+\\x04\\x05\\xe5\\x84\\x80-,\\x04T\\x98\\xc3\\xeb\\xbd\\xf7=\\xf0\\xa5/P\\x81\\xc6\\x9asb\\xd9\\x02\\xf2x\\x80J\\xca\\xb4{\\xef{_#}\\xc9Y\\xb9\\xe4\\xac\\xcb\\xccNm\\xb2\\xa7oin)\\xbd\\x10\\xe..."] [severity "ERROR"] [ver "OWASP_CRS/3.2.0"] [tag "application-multi"] [tag "language-php"] [tag "platform-multi"] [tag "attack-disclosure"] [tag "OWASP_CRS"] [t [hostname "shop.gnet.it"] [uri "/password"] [unique_id "YSY93rJJTBga6-8ecOI@VAAAAAE"], referer: https://shop.gnet.it/password

And, another one:

[Wed Aug 25 12:55:58.601666 2021] [:error] [pid 20282] [client 83.216.165.253:59075] [client 83.216.165.253] ModSecurity: Warning. Operator GE matched 4 at TX:outbound_anomaly_score. [file "/usr/share/modsecurity-crs/rules/RESPONSE-959-BLOCKING-EVALUATION.conf"] [line "75"] [id "959100"] [msg "Outbound Anomaly Score Exceeded (Total Score: 4)"] [tag "anomaly-evaluation"] [hostname "shop.gnet.it"] [uri "/password"] [unique_id "YSY93rJJTBga6-8ecOI@VAAAAAE"], referer: https://shop.gnet.it/password

The fields "data", "severity" and "ver" are not always in the log files, and the "tag" field sometimes is repeated for many times. How can i repeate the fields and not read them when these are not in the files? The pattern built in this mode doesn't work.


Solution

  • You can use

    \[(?<timestamp>%{DAY:day} %{MONTH:month} %{MONTHDAY} %{TIME} %{YEAR})\]\s+\[:%{LOGLEVEL:loglevel}\]\s+\[pid %{NUMBER:pid}]\s+\[client %{IP:clientip}:.*?]\s+\[client %{IP:clientip2}.*?\]\s+ModSecurity:\s+%{WORD:modSecurity}.\s+%{GREEDYDATA:error}\.\s+\[file\s%{QS:path_file}\]\s+\[line %{QS:line}]\s+\[id %{QS:id}\]\s+\[msg %{QS:message}\](?:\s+\[data %{QS:data}\])?(?:\s+\[severity %{QS:severity}\])?(?:\s+\[ver %{QS:ver}\])?\s+\[tag %{QS:tag}\](?:\s+\[tag %{QS:tag2}\])?(?:\s+\[tag %{QS:tag3}\])?(?:\s+\[tag %{QS:tag4}\])?(?:\s+\[tag %{QS:tag5}\])?.*?\[hostname %{QS:hostname}\]\s+\[uri %{QS:uri}\]\s+\[unique_id %{QS:unique_id}\],\s+referer:\s%{URI:referer}
    

    Here, tag is repeated max 5 times, and the three fields are now optional.

    I also replaced \s with \s+ to make sure you have a match regardless of how many whitespaces there are between fields, and I removed redundant parentheses from the pattern.