Search code examples
regexlinuxbashsedsubstring

Can you replace a substring from a regex match?


I am trying to find datatime stamps that have an incorrect timezone due to Daylight Savings Time (starting or ending). If the timezone is incorrect, I want to replace it with the correct timezone for those matches.

e.g. 2024-03-10T00:15:00-07:00 = 2024-03-10T00:15:00-08:00

I came up with the following regex and am able to find the culprits in the file using grep:

grep -E '~2024-03-10T[01]{2}:[0-9]{2}:[0-9]{2}-07:00' <filename>

I am trying to do the substitution using sed and the regex I came up with, but I can't seem to get it to work correctly.

Here's what I currently have:

sed 's/\(~2024-03-10T[01]{2}:[0-9]{2}:[0-9]{2}\)-07:00~/\1-08:00~/g' filename

What am I missing? This command does not seem to make any substitutions. Any help would be appreciated! Thanks in advance!


Solution

  • Setup:

    $ cat filename
    ~2024-03-09T00:15:00-07:00~ some text
    ~2024-03-10T00:15:00-07:00~ some text
    ~2024-03-11T00:15:00-07:00~ some text
    

    I tend to use -E to enable support for extended regexes and to simplify the capture group syntax.

    This requires just a couple small changes to your current code:

    sed -E 's/(~2024-03-10T[01]{2}:[0-9]{2}:[0-9]{2})-07:00/\1-08:00/g' filename
        ^^    ^                                     ^
    

    Where:

    • -E - enable support for extended regexes
    • ( | ) - no need to escape the parens (capture group delimiters)

    This generates:

    ~2024-03-09T00:15:00-07:00~ some text
    ~2024-03-10T00:15:00-08:00~ some text
    ~2024-03-11T00:15:00-07:00~ some text