Search code examples
pythonpython-3.xregexregexp-replace

Retain intermediate string using regex


I want to append a substring to a string pattern at all occurrences in multiple python code files. However the original string follows a pattern and is not an exact same string each time. Below are some examples of the variation:

Original Code:  a.b();
Want Code:      a.b().c();
Original Code:  a.b(param1=1);
Want Code:      a.b(param1=1, param2=2).c();
Original Code:  a.b(param1=1, param2=2);
Want Code:      a.b(param1=1, param2=2).c();
Original Code:  a.b(param1=D());
Want Code:      a.b(param1==D()).c();
Original Code:  X(a.b(param1=D()));
Want Code:      X(a.b(param1==D()).c());

Update: Since I am attempting to replace code in a file, the file contains indention and new lines for better readability: e.g

Original Code:  X(a.b(
                     param1=D()
                     )
                 );

Want Code:      X(a.b(
                     param1=D()
                     ).c()
                 );
Original Code:  X(a.b(
                     param1=D(),
                     param2="qwerty"
                     )
                 );

Want Code:      X(a.b(
                     param1=D(),
                     param2="qwerty"
                     ).c()
                 );
Original Code:  X(a.b(
                       newObj())
                 );

Want Code:      X(a.b(
                       newObj()).c()
                 );

I am not really concerned about parameters passed in function b. I simply need to append invocation of c() every time a.b() is invoked.

I am using the regex 'a.b(.*?)' to detect the appropriate original code. I tried using the following solution regexes: a.b($1).c() or a.b(\1).c() but to no avail.


Solution

  • You can use

    a\.b\([^()]*\)(?=;)
    
    • a\.b Match literally and escape the dot
    • \([^()]*\) Match from an opening parenthesis till closing parenthesis using a negated character class
    • (?=;) Positive lookahead, assert a ; to the right

    Regex demo | Python demo

    And replace with the full match \g<0> followed by .c()

    \g<0>.c()
    

    For example:

    import re
    
    regex = r"a\.b\([^()]*\)(?=;)"
    
    s = ("a.b();\n"
        "a.b(param1=1);\n"
        "a.b(param1=1, param2=2);")
    
    result = re.sub(regex, r"\g<0>.c()", s)
    
    if result:
        print (result)
    

    Output

    a.b().c();
    a.b(param1=1).c();
    a.b(param1=1, param2=2).c();
    

    Matching balanced parenthesis using the PyPi regex module:

    a\.b(\((?>[^()]++|(?1))*\))
    

    The pattern matches:

    • a\.b Match .b
    • ( Capture group 1
      • \( Match (
      • (?> Atomic group (no backtracking)
        • [^()]++ Match 1+ occurrences of any char except ( or )
        • | Or
        • (?1) Recurse the first subpattern (group 1)
      • )* Close the group and optionally repeat
      • \) Match )
    • ) Close group 1

    Regex demo | Python demo

    import regex
    
    pattern = r'a\.b(\((?>[^()]++|(?1))*\))'
    strings = [
        "a.b();",
        "a.b(param1=1);",
        "a.b(param1=1, param2=2);",
        "a.b(param1=d(abc=\"123\"));"
    ]
    
    for s in strings:
        m = regex.match(pattern, s)
        if m:
            print(f"{m.group()}.c()")
    

    Output

    a.b().c()
    a.b(param1=1).c()
    a.b(param1=1, param2=2).c()
    a.b(param1=d(abc="123")).c()