I have the following input and I'd like to write a regular expression which would match every line except the first and last.
2019-03-13 00:33:44,846 [INFO] -: foo
2019-03-13 00:33:45,096 [INFO] -: Exception sending email
To:
[foo@bar.com, bar@bar.com]
CC:
[baz@bar.com]
Subject:
some subject
Body:
some
body
2019-03-13 00:33:45,190 [INFO] -: bar
I thought the following should work, but it doesn't match anything:
pcregrep -M ".+Exception sending email[\S\s]+?(?=\d{4}(-\d\d){2})" ~/test.log
In plain English I would describe this as: look for a line with the exception text, followed by any character (including newlines) non-greedily, until we hit a positive lookahead for a date.
For some reason this also includes the final line, even though it doesn't on regex101. What am I missing here?
In a lot of cases, I would just use grep -A
in a case like this but the problem is that the body could be any arbitrary number of lines.
It almost certainly has to do with the tool. As the changelog for pcregrep states under "Version 8.12 15-Jan-2011" :
- In pcregrep, when a pattern that ended with a literal newline sequence was matched in multiline mode, the following line was shown as part of the match. This seems wrong, so I have changed it.
A simple fix would be to add a newline character inside the lookahead expression, which will pull it out of the match and prevent the last line from showing :
pcregrep -M ".+Exception sending email[\S\s]+?(?=[\r\n]\d{4}(-\d\d){2})" ~/test.log