I need help sorting through a text file
I have tried multiple variations of a for loop. I have also tried to strip all spaces and count the letters individually in the file. I have also tried multiple variations of the strip function and different if statements
for character in file:
if character.isupper():
capital += 1
file.readline().rstrip()
break
print(capital)
I expect the program to read each word or letter in the document and return the total amount of capitalized words contained within.
Let's say we have an example file doc.txt
with this content:
This is a test file for identifying Capital Words. I created this as an Example because the question's requirements could vary. For instance, should acronyms like SQL count as capital words? If no: this should result in eight capital words. If yes: this should result in nine.
If you wanted to count the capital (aka title case) words, but exclude all-caps words like acronyms, you could do something like this:
def count_capital_words(filename):
count = 0
with open(filename, 'r') as fp:
for line in fp:
for word in line.split():
if word.istitle():
print(word)
count += 1
return count
print(count_capital_words('doc.txt')) # 8
If all-caps words should be counted, you could modify the function to only check the first letter of a word. Note that the filter(None, ...)
function will ensure word
is never an empty string, avoiding the IndexError
that would be thrown in those cases:
def count_capital_words(filename):
count = 0
with open(filename, 'r') as fp:
for line in fp:
for word in filter(None, line.split()):
if word[0].isupper():
count += 1
return count
print(count_capital_words('doc.txt')) # 9
If you have more complicated requirements, you can get an iterable of words like this:
from itertools import chain
def get_words(filename):
with open(filename, 'r') as fp:
words = chain.from_iterable(line.split() for line in fp)
yield from words