Search code examples
pythonregexcapturing-group

Python: Regex: Detecting hyphenated names and non-hyphenated names with one regex


I need to extract people's names from a really long string.

Their names are in this format: LAST, FIRST.

Some of these people have hyphenated names. Some don't.

My attempt with a smaller string:

Input:

import re
text = 'Smith-Jones, Robert&Epson, Robert'
pattern = r'[A-Za-z]+(-[A-Za-z]+)?,\sRobert'
print re.findall(pattern, text)

Expected output:

['Smith-Jones, Robert', 'Epson, Robert']

Actual output:

['-Jones', '']

What am I doing wrong?


Solution

  • Use

    import re
    text = 'Smith-Jones, Robert&Epson, Robert'
    pattern = r'[A-Za-z]+(?:-[A-Za-z]+)?,\sRobert'
    print re.findall(pattern, text)
    # => ['Smith-Jones, Robert', 'Epson, Robert']
    

    Just make the capturing group non-capturing. The thing is that findall returns capture group values if they are specified in the regex pattern. So, the best way to solve this in this pattern is just replace (...)? with (?:...)?.

    See IDEONE demo