Search code examples
pythonregexacronym

Finding Acronyms Using Regex In Python


I'm trying to use regex in Python to match acronyms separated by periods. I have the following code:

import re
test_string = "U.S.A."
pattern = r'([A-Z]\.)+'
print re.findall(pattern, test_string)

The result of this is:

['A.']

I'm confused as to why this is the result. I know + is greedy, but why is are the first occurrences of [A-Z]\. ignored?


Solution

  • The (...) in regex creates a group. I suggest changing to:

    pattern = r'(?:[A-Z]\.)+'