Search code examples
pythonregexpython-re

Regular Expression with Two Names: One With Middle Initial and One Without


I'm attempting to identify the names in this string, using regex.

Example text:

Elon R. Musk (245)436-7956 Jeff Bezos (235)231-3432

What I've tried so far only seems to work for names without a middle initial:

([A-Z]{1}[a-z]+) ([A-Z]{1}[a-z]+)

Here's an example of Python code using the re module:

import re

strr = 'Elon R. Musk (245)436-7956 Jeff Bezos (235)231-3432'

def gimmethenamesdammit(strr):
    regex = re.compile("([A-Z]{1}[a-z]+) ([A-Z]{1}[a-z]+)")
    print(regex.findall(strr))

gimmethenamesdammit(strr)

How can I modify the regular expression above to highlight both the names Elon R. Musk and Jeff Bezos?

Desired output when running gimmethenamesdammit(strr):

gimmethenamesdammit(strr)

[('Elon', 'R.', 'Musk'), ('Jeff', 'Bezos')]

Solution

  • The following regular expression solves the issue:

    import re
    
    strr = 'Elon R. Musk (245)436-7956 Jeff Bezos (235)231-3432'
    
    regex = r"[A-Z]\w+\s[A-Z]?\w+"
    
    POCs = re.findall(regex, strr)
    
    f"{POCs[0]}, {POCs[-1]}"