Below is the code to find all upper case words from a file and add them to a list, how can I change this so that only words that start with an upper case are added to the list.
import re
matches = []
regex = r"\b[A-Z]\w*"
filename = r'C:\Users\Documents\romeo.txt'
with open(filename, 'r') as f:
for line in f:
matches += re.findall(regex, line)
print(matches)
File:
Hello, How are YOU
Output:
[Hello,How]
YOU should not be included in the output.
\w
matches both upper and lower case letters, as well as numbers and underscores. If you only want to match lower case letters, specify it like this:
regex = r"\b[A-Z][a-z]*\b"
text = 'Hello, How are YOU'
re.findall(pattern, text) # ['Hello', 'How']
Have a look at the Python regular expression syntax in the documentation to learn about other options.