I was writing a grok pattern to parse the logs in fluentd of cinder-api, one line out which is:
2015-09-17 17:44:49.663 ^[[00;32mDEBUG oslo_concurrency.lockutils [^[[00;36m-^[[00;32m] ^[[01;35m^[[00;32mAcquired semaphore "singleton_lock"^[[00m ^[[00;33mfrom (pid=30534) lock /usr/local/lib/python2.7/dist-packages/oslo_concurrency/lockutils.py:198^[[00m
The ^[[00;32m
and other such occurrences are ASCII colour codes, which when printed in a terminal is printed like this:
I need to parse the line and am able to do it when there are no colour codes using the (tested) pattern
%{TIMESTAMP_ISO8601:timestamp}%{SPACE}%{LOGLEVEL:loglevel}%{SPACE}{NOTSPACE:api}%{SPACE}\[(?:%{DATA:request})\]%{SPACE}%{GREEDYDATA:message}
How do I modify the grok pattern so that I am able to parse the coloured log line?
I have found out the following, if it helps anyone arrive to the solution:
^[
is actually the ESC key whose octal code is \033, hex code is \x1B, decimal ASCII code is 27 and is represented by ^[ too.A better solution than a literal escape character would be to follow the hints in the links provided:
Grok sits on top of regular expressions, so any regular expressions are valid in grok as well. The regular expression library is Oniguruma, and you can see the full supported regexp syntax on the Onigiruma site.
\t horizontal tab (0x09)
\v vertical tab (0x0B)
\n newline (0x0A)
\r return (0x0D)
\b back space (0x08)
\f form feed (0x0C)
\a bell (0x07)
\e escape (0x1B)
Also, color codes can be mixed with other video attributes which do not use two digits. Quoting from XTerm Control Sequences:
CSI Pm m Character Attributes (SGR).
Ps = 0 -> Normal (default).
Ps = 1 -> Bold.
Ps = 2 -> Faint, decreased intensity (ISO 6429).
Ps = 3 -> Italicized (ISO 6429).
Ps = 4 -> Underlined.
Ps = 5 -> Blink (appears as Bold).
Ps = 7 -> Inverse.
Ps = 8 -> Invisible, i.e., hidden (VT300).
Ps = 9 -> Crossed-out characters (ISO 6429).
Ps = 2 1 -> Doubly-underlined (ISO 6429).
Ps = 2 2 -> Normal (neither bold nor faint).
Ps = 2 3 -> Not italicized (ISO 6429).
Ps = 2 4 -> Not underlined.
Ps = 2 5 -> Steady (not blinking).
Ps = 2 7 -> Positive (not inverse).
you might also see those for normal, bold, underline and reverse. Finally, the number of parameters is not limited to two, and parameters are optional. The result might be
\e\[(\d*;)*(\d*)m