I am using regex to find all instances of consecutive words that are both capitalized, and where some of the consecutive words contain an apostrophe, ie ("The mother-daughter bakery, Molly’s Munchies, was founded in 2009"). And I have written a few lines of code to do this:
string = "The mother-daughter bakery, Molly’s Munchies, was founded in 2009"
test = re.findall("([A-Z][a-z]+(?=\s[A-Z])(?:\s[A-Z][a-z]+)+)", string)
print(test)
The issue is I am unable to print the result ('Molly's Munchies')
Instead my output is:
('[]')
Desired output:
("Molly's Munchies")
Any help appreciated, thank you!
You may use this regex in python:
r"\b[A-Z][a-z'’]*(?:\s+[A-Z][a-z'’]*)+"
RegEx Details:
\b
: Word match[A-Z]
: Match a capital letter[a-z'’]*
: Match 0 or more characters containing lowercase letter or '
or ’
(?:\s+[A-Z][a-z'’]*)+
Match 1 or more such capital letter words