pythonregex

Regex to find words starting with capital letters not at beginning of sentence


I've managed to find the words beginning with capital Letters but can't figure out a regex to filter out the ones starting at the beginning of the sentence.

Each sentence ends with a full stop and a space.

  • Test_string = This is a Test sentence. The sentence is Supposed to Ignore the Words at the beginning of the Sentence.

  • Desired output = ['Test', 'Supposed', 'Ignore', 'Words', 'Sentence']

I'm coding in Python. Will be glad if someone can help me out with the regex :)


Solution

  • You may use the following expression:

    (?<!^)(?<!\. )[A-Z][a-z]+
    

    Regex demo here.


    import re
    mystr="This is a Test sentence. The sentence is Supposed to Ignore the Words at the beginning of the Sentence."
    
    print(re.findall(r'(?<!^)(?<!\. )[A-Z][a-z]+',mystr))
    

    Prints:

    ['Test', 'Supposed', 'Ignore', 'Words', 'Sentence']