Search code examples
pythonregexpython-re

Regex in python for time format followed by comma and three digits


I have a file with thousands of time formats. Some of them are in their standard formats, while others are followed by a comma and three digits like this:

    Standard format: 00:00:44
    Followed by comma and three digits: 00:00:46,235

I've removed the standard formats using the following regex:

   text = re.sub(r'^((?:[01]\d|2[0-3]):[0-5]\d:[0-5]\d$)', '', text)

And that is ok. But for the time format followed by comma and three digits nothing that I've tried so far has helped me to remove them. Please, how can I remove this odd time format pattern?


Solution

  • Your regex matches the standard time format.

    r'^((?:[01]\d|2[0-3]):[0-5]\d:[0-5]\d$)'
    

    Just add the comma part at the end, and make it optional.

    r'^((?:[01]\d|2[0-3]):[0-5]\d:[0-5]\d(?:,\d{3})?$)'
    

    Explanation for (?:,\d{3})?:

    (?:      )     Non-capturing group
       ,\d{3}      Comma, then three digits
              ?    Match zero or one times