Search code examples
pythonpython-3.xregexdata-extraction

remove n before a string


I want to remove unrequired r and n at beginning of each upper-case word and number in this string. I tried regex. Not sure if regex or some other method would be helpful here.

This is the code I am trying to use:

text = "nFamily n49 new nTom"

regex_pattern =  re.compile(r'.*n[A-Z][a-z]*|[0-9]*\s')
matches = regex_pattern.findall(text)
for match in matches:
    text = text.replace(match," ")
print(text)

Expected output:

Family 49 new Tom

Solution

  • You can use

    text = re.sub(r'\bn(?=[A-Z0-9])', '', text)
    

    See the regex demo.

    Details:

    • \b - here, start of a word
    • n - a n letter
    • (?=[A-Z0-9]) - a positive lookahead that requires an uppercase ASCII letter or a digit to be present immediately to the right of the current location.

    See the Python demo:

    import re
    rx = r"\bn(?=[A-Z0-9])"
    text = "nFamily n49 new nTom"
    print( re.sub(rx, '', text) )
    # => Family 49 new Tom