Search code examples
pythonregexregular-language

How to use regex substitute using different capture and match strings?


I want the regex machine to look for a certain pattern, and then only replace a subset of that pattern. The strings look like this:

string1 = 'r|gw|gwe|bbbss|gwe | s'

And, I want to replace some of the strings using a regex like this:

re.sub('\|(gw.*)\|','nn',string1)

So, I want to look for the stuff between the |'s, but I only want replace what's between them, and not the entire |(gw.*)|.

Is there a concise way to do this?


Solution

  • If you want to retain the pipe characters and match overlapping context, you need to use lookaround assertions. Because * is a greedy operator, it will consume as much as possible.

    In this case you can use a negated character class or *? to prevent greediness.

    >>> re.sub(r'(?<=\|)gw[^|]*(?=\|)', 'nn', s)
    'r|nn|nn|bbbss|nn| s'
    

    Or you could take a more general approach perhaps:

    >>> '|'.join(['nn' if i.startswith('gw') else i for i in s.split('|')])
    'r|nn|nn|bbbss|nn| s'