Search code examples
pythonstringfilesearchcapitalization

Searching for the amount of capital words in a text file Python


I need help sorting through a text file

I have tried multiple variations of a for loop. I have also tried to strip all spaces and count the letters individually in the file. I have also tried multiple variations of the strip function and different if statements

for character in file:
    if character.isupper():
        capital += 1
        file.readline().rstrip()
        break

print(capital)

I expect the program to read each word or letter in the document and return the total amount of capitalized words contained within.


Solution

  • Let's say we have an example file doc.txt with this content:

    This is a test file for identifying Capital Words. I created this as an Example because the question's requirements could vary. For instance, should acronyms like SQL count as capital words? If no: this should result in eight capital words. If yes: this should result in nine.

    If you wanted to count the capital (aka title case) words, but exclude all-caps words like acronyms, you could do something like this:

    def count_capital_words(filename):                                               
        count = 0                                                                    
        with open(filename, 'r') as fp:                                              
            for line in fp:                                                          
                for word in line.split():                                            
                    if word.istitle():                                               
                        print(word)                                                  
                        count += 1                                                   
        return count
    
    
    print(count_capital_words('doc.txt'))  # 8
    
    

    If all-caps words should be counted, you could modify the function to only check the first letter of a word. Note that the filter(None, ...) function will ensure word is never an empty string, avoiding the IndexError that would be thrown in those cases:

    def count_capital_words(filename):                                               
        count = 0                                                                    
        with open(filename, 'r') as fp:                                              
            for line in fp:                                                          
                for word in filter(None, line.split()):                              
                    if word[0].isupper():                                            
                        count += 1                                                   
        return count
    
    
    print(count_capital_words('doc.txt'))  # 9
    
    

    If you have more complicated requirements, you can get an iterable of words like this:

    from itertools import chain                                                      
    
    
    def get_words(filename):                                                         
        with open(filename, 'r') as fp:                                              
            words = chain.from_iterable(line.split() for line in fp)                 
            yield from words