Search code examples
regexrubyregex-greedyfluentd

Regex Issue when log format has multiple ip's


I have an issue with fluenTd log parser. The following config works fine when there are 2 ip’s.

expression  /^(?<client_ip>[^ ]*)(?:, (?<lb_ip>[^ ]*))? (?<ident>[^ ]*) (?<user>[^ ]*) \[(?<time>[^ ]* [^ ]*)\] "(?<method>\S+)(?: +(?<path>[^ ]*) (?<protocol>[A-Z]{1,}[^ ]*)+\S*)?" (?<code>[^ ]*) (?<size>[^ ]*)/

This matches:

148.165.41.129, 10.25.1.120 - - [09/Dec/2019:16:22:23 +0000] "GET /comet_request/44109669162/F1551019433002Y5MYEP?F155101943300742PMLG=1551019433877&_=1575904426457 HTTP/1.1" 200 0 0 0

When there are 3 ip’s, i get a pattern not match warning.

This doesn't match :

176.30.235.70, 165.225.70.200, 10.25.1.120 - - [09/Dec/2019:13:30:57 +0000] \"GET /comet_request/71142769981/F1551018730440IY5YNF?F1551018721447ZVKYZ4=1551018733078&_=1575898029473 HTTP/1.1\" 200 0 0 0

I tried the following regex, but doesn't work.Can someone please help?

expression /^(?<client_ip>[^ ]*)(?:, (?<proxy_ip>[^ ]*))? (?:, (?<lb_ip>[^ ]*))? (?<ident>[^ ]*) (?<user>[^ ]*) \[(?<time>[^ ]* [^ ]*)\] "(?<method>\S+)(?: +(?<path>[^ ]*) (?<protocol>[A-Z]{1,}[^ ]*)+\S*)?" (?<code>[^ ]*) (?<size>[^ ]*)$/

Solution

  • You need to match the IPs with a more specific pattern, like [\d.]+ or [^, ]+, and make sure you also match the last two fields (you are not matching them and $ requires the end of line/string).

    Use a pattern like

    ^(?<client_ip>[^ ,]+)(?:, +(?<proxy_ip>[^ ,]+))?(?:, +(?<lb_ip>[^ ,]+))? (?<ident>[^ ]+) (?<user>[^ ]+) \[(?<time>[^\]\[ ]* [^\]\[ ]*)\] "(?<method>\S+)(?: +(?<path>\S+) (?<protocol>[A-Z][^" ]*)[^"]*)?" (?<code>\S+) (?<size>\S+) \S+ \S+$
    

    See the regex demo

    The IP matching part is ^(?<client_ip>[^ ,]+)(?:, +(?<proxy_ip>[^ ,]+))?(?:, +(?<lb_ip>[^ ,]+))?, see that [^ ,]+ matches 1+ chars other than a space and , and \S+ \S+ are added at the end of the pattern (if these are numbers, you may use \d+ \d+ and capture them if needed).