python string file search capitalization

Searching for the amount of capital words in a text file Python

I need help sorting through a text file

I have tried multiple variations of a for loop. I have also tried to strip all spaces and count the letters individually in the file. I have also tried multiple variations of the strip function and different if statements

for character in file:
    if character.isupper():
        capital += 1
        file.readline().rstrip()
        break

print(capital)

I expect the program to read each word or letter in the document and return the total amount of capitalized words contained within.

Solution

Let's say we have an example file doc.txt with this content:

This is a test file for identifying Capital Words. I created this as an Example because the question's requirements could vary. For instance, should acronyms like SQL count as capital words? If no: this should result in eight capital words. If yes: this should result in nine.

If you wanted to count the capital (aka title case) words, but exclude all-caps words like acronyms, you could do something like this:

def count_capital_words(filename):                                               
    count = 0                                                                    
    with open(filename, 'r') as fp:                                              
        for line in fp:                                                          
            for word in line.split():                                            
                if word.istitle():                                               
                    print(word)                                                  
                    count += 1                                                   
    return count


print(count_capital_words('doc.txt'))  # 8

If all-caps words should be counted, you could modify the function to only check the first letter of a word. Note that the filter(None, ...) function will ensure word is never an empty string, avoiding the IndexError that would be thrown in those cases:

def count_capital_words(filename):                                               
    count = 0                                                                    
    with open(filename, 'r') as fp:                                              
        for line in fp:                                                          
            for word in filter(None, line.split()):                              
                if word[0].isupper():                                            
                    count += 1                                                   
    return count


print(count_capital_words('doc.txt'))  # 9

If you have more complicated requirements, you can get an iterable of words like this:

from itertools import chain                                                      


def get_words(filename):                                                         
    with open(filename, 'r') as fp:                                              
        words = chain.from_iterable(line.split() for line in fp)                 
        yield from words