My log file is in the following format
mm/dd/yyyy hh:mm:ss Description
11/05/2013 03:01:00 Shutting down server...
11/05/2013 03:01:23 DumpCache(): 284114 items.
To keep things simple I can use the following regex to match the date
^(../../....)
This works because I know every line starts with a 10 digit date including slashes. But the purpose of this search is not to find every line in the log, it is to find when the next line does not match the previous line (date change).
I imagine that a look behind is capable of doing this, but I cannot figure out how to ignore the description and the time and only look at the date of the previous line.
In python:
import re
diff_line_re = re.compile(r'''
(?:
(?P<date>\d{2}/\d{2}/\d{4})
\s+
(?P<time>[\d:]+)
\s+
(?P<message>[^\n]+)
\n
)(?!(?P=date))
''', re.X)
Given the data:
log_lines = '''
11/05/2013 03:01:00 1 Shutting down server...
11/05/2013 03:01:23 2 DumpCache(): 284114 items.
11/05/2013 03:01:00 3 Shutting down server...
11/07/2013 03:01:23 5 DumpCache(): 284114 items.
11/07/2013 03:01:00 6 Shutting down server...
11/08/2013 03:01:23 7 DumpCache(): 284114 items.
11/08/2013 03:01:00 8 Shutting down server...
11/09/2013 03:01:23 9 DumpCache(): 284114 items.
'''
We execute the script:
print(diff_line_re.findall(log_lines))
Ouput:
[('11/05/2013', '03:01:00', '3 Shutting down server...'),
('11/07/2013', '03:01:00', '6 Shutting down server...'),
('11/08/2013', '03:01:00', '8 Shutting down server...'),
('11/09/2013', '03:01:23', '9 DumpCache(): 284114 items.')]
See python regexes documentation for details: http://docs.python.org/3/library/re.html#module-re