Search code examples
logstashlogstash-grok

Grok regex with escaped “[“, “(“, and “)” chars problems


Elastic newbie here - working with a new 5.5 install. I have a log line that looks like so:

[2015/10/01@19:48:22.785-0400] P-4780 T-2208 I DBUTIL : (451) prostrct create session begin for timk519 on CON:.

I have the following regex:

\[%{DATE:date}@%{TIME:time}-(?<gmtoffset>\d{4})\]\s*(?<procid>P-[0-9]+)\s*(?<threadid>T-[0-9]+)\s*(?<msgtype>[ifIF])\s*(?<processtype>[a-zA-Z]+)\s*(?<usernumber>[0-9]+|[:])\s*\((?<msgnum>[0-9]+|[\-]+)\)\s*%{GREEDYDATA:message}

When I try it in the kibana grok debugger it doesn't work and I get the following error:

GrokDebugger: [parse_exception] [pattern_definitions] property isn't a map, but of type [java.lang.String], with { header={ processor_type="grok" & property_name="pattern_definitions" } }

this appears to be due to the \[ at the start of the line. If I replace the leading \[ with a period "." I get this

.%{DATE:date}@%{TIME:time}-(?<gmtoffset>\d{4})\]\s*(?<procid>P-[0-9]+)\s*(?<threadid>T-[0-9]+)\s*(?<msgtype>[ifIF])\s*(?<processtype>[a-zA-Z]+)\s*(?<usernumber>[0-9]+|[:])\s*\((?<msgnum>[0-9]+|[\-]+)\)\s*%{GREEDYDATA:message}

the grok debugger and https://grokdebug.herokuapp.com/ are good with this pattern.

When I put this regex into logstash, it fails to recognize the msgnum (451) part of the line because of the escaped parens \( and \) around the msgnum field, and as a result fails to recognize the line as a legal string.

Am I escaping something incorrectly? Is this a bug?

UPDATE 2017-07-21

I got around the issue with escaping ( and ) by putting them in [(] and [)]. I haven't figured out a way to solve matching the leading [ yet.

UPDATE 2017-07-24

The answer below was an epic catch and I've used that to create the following custom patterns:

DBTIME %{TIME}[-+]\d{4}
DBTIMESTAMP %{YEAR}/%{MONTHNUM}/%{MONTHDAY}@%{DBTIME}

which I've implemented in my grok statement like so:

\[%{DBTIMESTAMP:dbdatetime}\]\s*%{PROCESSID:processid}\s*%{DBTHREADID:threadid}\s*%{DBMSGTYPE:msgtype}\s*%{PROCESSTYPE:processtype}?\s*%{USERNUMBER:usernumber}?\s*:\s*[(]%{MSGNUMBER:msgnumber}[)].\s*%{GREEDYDATA:eventmessage}\s*\r

I then use the date filter to turn the dbdatetime into a @timestamp setting, and now the regex matches the incoming log stream which is what I want. Thx!


Solution

  • The devil is in the detail and the error is not apparent at first. The reason the Grok Debugger fails is because of your use of the DATE pattern. This pattern resolves like this:

    DATE_US %{MONTHNUM}[/-]%{MONTHDAY}[/-]%{YEAR}
    DATE_EU %{MONTHDAY}[./-]%{MONTHNUM}[./-]%{YEAR}
    

    MONTHNUM and MONTHDAY are both 2 digit patterns, which in turn actually means they are matching the 15 in your year. This is the reason why the pattern does not work because \[%{DATE} is actually not matching (it is missing the 20). Why does the pattern .%{DATE} work tough? Because you are not matching the [ with the dot, your are matching the 0 of the year.

    How to fix this? Use a custom pattern to match the date. Something like this works:

    \[(?<date>%{YEAR}/%{MONTHNUM}/%{MONTHDAY})@%{TIME:time}-(?<gmtoffset>\d{4})\]\s*(?<procid>P-[0-9]+)\s*(?<threadid>T-[0-9]+)\s*(?<msgtype>[ifIF])\s*(?<processtype>[a-zA-Z]+)\s*(?<usernumber>[0-9]+|[:])\s*\((?<msgnum>[0-9]+|[\-]+)\)\s*%{GREEDYDATA:message}
    

    This will return the following output:

    {
      "date": "2015/10/01",
      "msgnum": "451",
      "procid": "P-4780",
      "processtype": "DBUTIL",
      "message": "prostrct create session begin for timk519 on CON:.",
      "threadid": "T-2208",
      "usernumber": ":",
      "gmtoffset": "0400",
      "time": "19:48:22.785",
      "msgtype": "I"
    }