Search code examples
pythonregexfindallapostrophecapitalization

Find consecutive capitalized words in a string, including apostrophes


I am using regex to find all instances of consecutive words that are both capitalized, and where some of the consecutive words contain an apostrophe, ie ("The mother-daughter bakery, Molly’s Munchies, was founded in 2009"). And I have written a few lines of code to do this:

string = "The mother-daughter bakery, Molly’s Munchies, was founded in 2009"
test = re.findall("([A-Z][a-z]+(?=\s[A-Z])(?:\s[A-Z][a-z]+)+)", string)
print(test)

The issue is I am unable to print the result ('Molly's Munchies')

Instead my output is:

('[]')

Desired output:

("Molly's Munchies")

Any help appreciated, thank you!


Solution

  • You may use this regex in python:

    r"\b[A-Z][a-z'’]*(?:\s+[A-Z][a-z'’]*)+"
    

    RegEx Demo

    RegEx Details:

    • \b: Word match
    • [A-Z]: Match a capital letter
    • [a-z'’]*: Match 0 or more characters containing lowercase letter or ' or
    • (?:\s+[A-Z][a-z'’]*)+ Match 1 or more such capital letter words