Search code examples
pythonregexregex-lookaroundspositive-lookbehind

Replace every symbol of the word after delimiter using python re


I would like to replace every symbol of a word after - with *.

For example:

asd-wqe ffvrf    =>    asd-*** ffvrf

In TS regex it could be done with (?<=-\w*)\w and replacement *. But default python regex engine requires lookbehinds of fixed width.

Best I can imaging is to use

(?:(?<=-)|(?<=-\w)|(?<=-\w{2}))\w

and repeat lookbehing some predetermined big number of times, but it seems not very sustainable or elegant.

Is it possible to use default re module for such a task with some more elegant pattern?

Demo for testing here.

P.S. I'm aware that alternative regex engines, that support lookbehind of variable length exist, but would like to stick with default one for a moment if possible.


Solution

  • I think you can not do that with Python re, as you want to match a single character knowing that to the left is - followed by optional word characters.

    I would write it like this with a callback and then get the length of the match for the replacement of the * chars

    import re
    
    strings = [
        "asd-wqe ffvrf",
        "asd-ss sd",
        "a-word",
        "a-verylongword",
        "an-extremelyverylongword"
    ]
    pattern = r"(?<=-)\w+"
    for s in strings:
        print(re.sub(pattern, lambda x: len(x.group()) * "*", s))
    

    Output

    asd-*** ffvrf
    asd-** sd
    a-****
    a-************
    an-*********************
    

    See a python demo.


    An alternative to a quantifier in a lookbehind assertion is using the \G anchor (which is also not supported by Python re)

     (?:-|\G(?!^))\K\w
    

    Regex demo