We have log files containing output from the Apache Http client. We are seeing output as it goes "over the wire" and it includes lines like:
<< HTTP/1.1 200 The request has succeeded
The chevrons '<<' indicate incoming, in contrast to '>>' for outgoing content. Using 'tail -F' to follow these logs is entertaining enough but I thought it would be a useful exercise to use sed to colorize the output according to whether it is input or output.
A simple test will show you what I mean:
echo '<< HTTP/1.1 200 The request has succeeded' | sed -r -e 's_<<_\x1b[31;1m&\x1b[0m_i' -e 's_>>_\x1b[32;1m&\x1b[0m_i'
for input, and
echo '>> HTTP/1.1 200 The request has succeeded' | sed -r -e 's_<<_\x1b[31;1m&\x1b[0m_i' -e 's_>>_\x1b[32;1m&\x1b[0m_i'
for output.
So far, so good. The descent into regex madness began when it occurred to me that it would be even more useful to highlight the HTTP response codes and colorize them according to the class: green for 2xx and red for 5xx, for example.
So far I can match up to the first digit in the response code with:
echo '<< HTTP/1.1 200 The request has succeeded' | sed -r -e 's_<<_\x1b[31;1m&\x1b[0m_i' -e 's_>>_\x1b[32;1m&\x1b[0m_i' -e 's_HTTP[^[:alpha:]]*2\d*_\x1b[32;1m&\x1b[0m_g'
It is only colorizing up to, << HTTP/1.1 2
. My expectation was that HTTP[^[:alpha:]]*2\d*
would match 'HTTP', followed by everything that is not alphabetic upto '2', followed by any number of digits. Ideally I would use '{2}' rather than '*' but that has the same effect.
Can any regex guru point out my mistake?
echo '<< HTTP/1.1 200 The request has succeeded' | \
sed -r -e 's_<<_\x1b[31;1m&\x1b[0m_;t http
s_>>_\x1b[32;1m&\x1b[0m_
:http
s_HTTP[^[:alpha:]]\{1,\}2[0-9]\{1,\}\x1b[32;1m&\x1b[0m_g'
Try this.
i
a the end of <<
and >>
line because it is never upper or lower case*
or {2}
by \{1,\}
but could also use +
on GNU sedt http
with :http
because you certainly will go further and some jump will make it faster on your extensiontry also -u
for unbuffered that is better on a real stream