Search code examples
regexoption-typebackreferencecapturing-group

Python capturing groups with regex and conditional inserts


I have the below which finds the given pattern and returns it without the space in the middle. This works, but I also want to add a / between \g<2> and \g<3> in the sub, but only when it is not already there; otherwise there will be a duplicate //. Everything I've tried messes up the capturing groups. Any help?

pattern = re.compile(r"((\d{1,2}/\d{1,2}) (/?(\d{4}|\d{2})))")

report_text = pattern.sub("\g<2>\g<3>", report_text)

Here are my inputs and expected result:

Inputs                    Expected

02/58 98                  02/58/98
02/58 /98                 02/58/98
02/58 9518                02/58/9518
02/58 /98                 02/58/9518

Solution

  • Your pattern seems to overuse capturing groups a bit. You can place the optional /? outside of capturing groups, so it will be matched but not reinserted and simply use / in the replace, like

    re.sub(r'(\d{1,2}/\d{1,2}) /?(?:(\d{4}|\d{2}))', r'\g<1>/\g<2>', report_text)
    

    See also https://ideone.com/YuWrAy