Search code examples
regexregexp-replace

regex: how to replace all occurrences of a string within another string, if the original string matches some filter


  • i need to replace all occurrences of a string within another string, if the original string matches some filter
  • i can only use a single regex using an s command, because i need to send the assembled command to a 3rd party API

i have tried to use positive lookahead as to not consume the string in which i want to replace characters, but somehow i can not get the replacing to work as expected.

here is what i have tried so far and what was the outcome: (note that the filter - here [0-9]+ is just an example and will be passed in from the call site and i can not directly influence it.

expected result: 9999997890

perl -e '$x = "4564567890"; $x =~ s/(?=^[0-9]+$)456/999/g; print $x'

actual result: 9994567890

  1. this replaces only the first occurrence of 456. why is this happening?
  2. even less understandable for me is that if i change the filter lookahead to (?=.*), both occurrences of 456 are being replaced. why does changing the filter have any effect on the replacing portion of the regex?

i seem to be missing some very basic point about how mixing filtering and replacing stuff in one s command works.


Solution

  • Your regex only replaces the 456 that is at the start of the string that only consists of digits.

    You may use

    s/(?:\G(?!^)|^(?=\d+$))\d*?\K456/999/g
    

    See the regex demo

    Pattern details

    • (?:\G(?!^)|^(?=\d+$)) - a custom boundary that matches either the end of the previous successful match (\G(?!^)) or (|) the start of string (^) that only contains digits ((?=\d+$))
    • \d*? - 0+ digits, but as few as possible
    • \K - omit the currently matched chars
    • 456 - a 456 substring.

    The idea is:

    • Use the \G based pattern to pre-validate the string: (?:\G(?!^)|^(?=<YOUR_VALID_LINE_FORMAT>$))
    • Then adjust the consuming pattern after the above one.