Search code examples
regexgroklogstash-grokansi-escapefluentd

Grok pattern to parse the ESC key


I was writing a grok pattern to parse the logs in fluentd of cinder-api, one line out which is:

2015-09-17 17:44:49.663 ^[[00;32mDEBUG oslo_concurrency.lockutils [^[[00;36m-^[[00;32m] ^[[01;35m^[[00;32mAcquired semaphore "singleton_lock"^[[00m ^[[00;33mfrom (pid=30534) lock /usr/local/lib/python2.7/dist-packages/oslo_concurrency/lockutils.py:198^[[00m

The ^[[00;32m and other such occurrences are ASCII colour codes, which when printed in a terminal is printed like this:

I need to parse the line and am able to do it when there are no colour codes using the (tested) pattern %{TIMESTAMP_ISO8601:timestamp}%{SPACE}%{LOGLEVEL:loglevel}%{SPACE}{NOTSPACE:api}%{SPACE}\[(?:%{DATA:request})\]%{SPACE}%{GREEDYDATA:message}

How do I modify the grok pattern so that I am able to parse the coloured log line?

I have found out the following, if it helps anyone arrive to the solution:

  • ^[ is actually the ESC key whose octal code is \033, hex code is \x1B, decimal ASCII code is 27 and is represented by ^[ too.
  • There is a fluentd plugin named color-stripper that does the same but does not working for me, neither is suitable for my use case.

Solution

  • A better solution than a literal escape character would be to follow the hints in the links provided:

    • Regular Expressions

      Grok sits on top of regular expressions, so any regular expressions are valid in grok as well. The regular expression library is Oniguruma, and you can see the full supported regexp syntax on the Onigiruma site.

    • Oniguruma Regular Expressions: 2. Characters \t horizontal tab (0x09) \v vertical tab (0x0B) \n newline (0x0A) \r return (0x0D) \b back space (0x08) \f form feed (0x0C) \a bell (0x07) \e escape (0x1B)

    Also, color codes can be mixed with other video attributes which do not use two digits. Quoting from XTerm Control Sequences:

    CSI Pm m Character Attributes (SGR). Ps = 0 -> Normal (default). Ps = 1 -> Bold. Ps = 2 -> Faint, decreased intensity (ISO 6429). Ps = 3 -> Italicized (ISO 6429). Ps = 4 -> Underlined. Ps = 5 -> Blink (appears as Bold). Ps = 7 -> Inverse. Ps = 8 -> Invisible, i.e., hidden (VT300). Ps = 9 -> Crossed-out characters (ISO 6429). Ps = 2 1 -> Doubly-underlined (ISO 6429). Ps = 2 2 -> Normal (neither bold nor faint). Ps = 2 3 -> Not italicized (ISO 6429). Ps = 2 4 -> Not underlined. Ps = 2 5 -> Steady (not blinking). Ps = 2 7 -> Positive (not inverse).

    you might also see those for normal, bold, underline and reverse. Finally, the number of parameters is not limited to two, and parameters are optional. The result might be

    \e\[(\d*;)*(\d*)m