Search code examples
pythonregexpython-re

How to remove patterned texts surrounding a nested text on a string using regex?


I have a text txt = 'The fat \m{cat sat} on \m{the} mat.' which I hope to output 'The fat cat sat on the mat.'

I have tried the following two ways:

re.sub(r'\\m\{(.*)\}', '', txt) 
# output: 'The fat  mat.'

re.sub(r'\\m\{(?=.*)\}', '', txt) 
# output: 'The fat \\m{cat sat} on \\m{the} mat.'

Why is that and how should I do?


Solution

  • You can modify your own regex a bit to make it work

    • Use backreference to replace value instead of just empty string
    • Also make you regex lazy i.e (.*) -> (.*?) or ([^}]*)

    import re
    txt = 'The fat \m{cat sat} on \m{the} mat.';
    r = re.sub(r'\\m\{(.*?)\}', "\g<1>", txt);
    print(r);      
    
    //The fat cat sat on the mat.
    

    Note:- you can use r"\1" or "\\1" instead of \g<1> to back reference the captured group