I have a file with thousands of time formats. Some of them are in their standard formats, while others are followed by a comma and three digits like this:
Standard format: 00:00:44
Followed by comma and three digits: 00:00:46,235
I've removed the standard formats using the following regex:
text = re.sub(r'^((?:[01]\d|2[0-3]):[0-5]\d:[0-5]\d$)', '', text)
And that is ok. But for the time format followed by comma and three digits nothing that I've tried so far has helped me to remove them. Please, how can I remove this odd time format pattern?
Your regex matches the standard time format.
r'^((?:[01]\d|2[0-3]):[0-5]\d:[0-5]\d$)'
Just add the comma part at the end, and make it optional.
r'^((?:[01]\d|2[0-3]):[0-5]\d:[0-5]\d(?:,\d{3})?$)'
Explanation for (?:,\d{3})?
:
(?: ) Non-capturing group
,\d{3} Comma, then three digits
? Match zero or one times