Search code examples
pythonpython-3.xregexfilepython-re

How to find all words with first letter as upper case using Python Regex


I need to find all the words in a file which start with an upper case, I tried the below code but it returns an empty string.

import os
import re

matches = []

filename = 'C://Users/Documents/romeo.txt'
with open(filename, 'r') as f:
    for line in f:
        regex = "^[A-Z]\w*$"
        matches.append(re.findall(regex, line))
print(matches)

File:

Hi, How are You?

Output:

[Hi,How,You]

Solution

  • You can use a word boundary instead of the anchors ^ and $

    \b[A-Z]\w*
    

    Regex demo

    Note that if you use matches.append, you add an item to the list and re.findall returns a list, which will give you a list of lists.

    import re
    
    matches = []
    regex = r"\b[A-Z]\w*"
    filename = r'C:\Users\Documents\romeo.txt'
    with open(filename, 'r') as f:
        for line in f:
            matches += re.findall(regex, line)
    print(matches)
    

    Output

    ['Hi', 'How', 'You']
    

    If there should be a whitespace boundary to the left, you could also use

    (?<!\S)[A-Z]\w*
    

    Regex demo


    If you don't want to match words using \w with only uppercase chars, you could use for example a negative lookahead to assert not only uppercase chars till a word boundary

    \b[A-Z](?![A-Z]*\b)\w*
    
    • \b A word boundary to prevent a partial match
    • [A-Z] Match an uppercase char A-Z
    • (?![A-Z]*\b) Negative lookahead, assert not only uppercase chars followed by a word boundary
    • \w* Match optional word chars

    Regex demo


    To match a word that starts with an uppercase char, and does not contain any more uppercase chars:

    \b[A-Z][^\WA-Z]*\b
    
    • \b A word boundary
    • [A-Z] Match an uppercase char A-Z
    • [^\WA-Z]* Optionally match a word char without chars A-Z
    • \b A word boundary

    Regex demo