Search code examples
pythonregexparsinglog-analysis

Python regex is not extracting a substring from my log file


I'm using

date = re.findall(r"^(?:\w{3} ){2}\d{2} (?:[\d]{2}:){2}\d{2} \d{4}$", message)

in Python 2.7 to extract the substrings:

Wed Feb 04 13:29:49 2015
Thu Feb 05 13:45:08 2015

from a log file like this:

1424,Wed Feb 04 13:29:49 2015,51
1424,Thu Feb 05 13:45:08 2015,29

It is not working, and I'm required to use regex for this task, otherwise I would have split() it. What am I doing wrong?


Solution

  • As your sub-strings doesn't began from the first part of your string you dont need to assert position at start and end of the string so you can remove ^ and $ :

    >>> s ="""
    1424,Wed Feb 04 13:29:49 2015,51
    1424,Thu Feb 05 13:45:08 2015,29"""
    >>> date = re.findall(r"(?:\w{3} ){2}\d{2} (?:[\d]{2}:){2}\d{2} \d{4}", s)
    >>> date
    ['Wed Feb 04 13:29:49 2015', 'Thu Feb 05 13:45:08 2015']
    

    Also as an alternative proposition you can just use a positive look-behind :

    >>> date = re.findall(r"(?<=\d{4},).*", s)
    >>> date
    ['Wed Feb 04 13:29:49 2015,51', 'Thu Feb 05 13:45:08 2015,29']
    

    or without using regex you can use str.split() and str.partition() for such tasks :

    >>> s ="""
    1424,Wed Feb 04 13:29:49 2015,51
    1424,Thu Feb 05 13:45:08 2015,29"""
    
    >>> [i.partition(',')[-1] for i in s.split('\n')]
    ['Wed Feb 04 13:29:49 2015,51', 'Thu Feb 05 13:45:08 2015,29']