Search code examples
python-3.xregexfilepython-refindall

How to find all words that start with an uppercase letter using Python Regex


Below is the code to find all upper case words from a file and add them to a list, how can I change this so that only words that start with an upper case are added to the list.

import re

matches = []
regex = r"\b[A-Z]\w*"
filename = r'C:\Users\Documents\romeo.txt'
with open(filename, 'r') as f:
    for line in f:
        matches += re.findall(regex, line)
print(matches)

File:

Hello, How are YOU

Output:

[Hello,How]

YOU should not be included in the output.


Solution

  • \w matches both upper and lower case letters, as well as numbers and underscores. If you only want to match lower case letters, specify it like this:

    regex = r"\b[A-Z][a-z]*\b"
    text = 'Hello, How are YOU'
    re.findall(pattern, text) # ['Hello', 'How']
    

    Have a look at the Python regular expression syntax in the documentation to learn about other options.