Search code examples
pythonpython-3.xregex

Regex to substitute the next two words after a matching point


I'm writing a Regex to substitute the maximum of the next two words after the matching point.

Expected prefixes: dr, doctor, pr, professor

Sample text:

Examination carried out in agreement with and in the presence of Dr John Doe (rhythmologist).

Expected outcome:

Examination carried out in agreement with and in the presence of Dr [DOCTOR_NAME] (rhythmologist).

Here is my current Regex:

(\s|^|^(.*)|\()(dr|doctor|pr|professor)(\s|[.])(\s*([A-Z]\w+)){0,2}

However, it doesn't account for an ending parenthesis, as shown in the following image: enter image description here

Really appreciate the help to improve the Regex. Thank you!


Solution

  • You seem to be making this more complicated than it has to be. You want to match a doctor/professional title, either as a full word or abbreviation, followed by 1 or 2 word names. Then use this pattern:

    \b(dr|doctor|pr|professor)\b[.]?\s+(\w+(?: \w+)?)
    

    and replace with this:

    $1 [$2]
    

    Demo

    Explanation of pattern:

    • \b(dr|doctor|pr|professor)\b match title
    • [.]? optional dot
    • \s+ one or more whitespace characters
    • (\w+(?: \w+)?) then match and capture in $1 one or two word names