I'm having trouble trying to find the amount of unique words in a speech text file (well actually 3 files), I'm just going to give you my full code so there is no misunderstandings.
#This program will serve to analyze text files for the number of words in
#the text file, number of characters, sentances, unique words, and the longest
#word in the text file. This program will also provide the frequency of unique
#words. In particular, the text will be three political speeches which we will
#analyze, building on searching techniques in Python.
def main():
harper = readFile("Harper's Speech.txt")
newWords = cleanUpWords(harper)
print(numCharacters(harper), "Characters.")
print(numSentances(harper), "Sentances.")
print(numWords(newWords), "Words.")
print(uniqueWords(newWords), "Unique Words.")
print("The longest word is: ", longestWord(newWords))
obama1 = readFile("Obama's 2009 Speech.txt")
newWords = cleanUpWords(obama1)
print(numCharacters(obama1), "Characters.")
print(numSentances(obama1), "Sentances.")
print(numWords(obama1), "Words.")
print(uniqueWords(newWords), "Unique Words.")
print("The longest word is: ", longestWord(newWords))
obama2 = readFile("Obama's 2008 Speech.txt")
newWords = cleanUpWords(obama2)
print(numCharacters(obama2), "Characters.")
print(numSentances(obama2), "Sentances.")
print(numWords(obama2), "Words.")
print(uniqueWords(newWords), "Unique Words.")
print("The longest word is: ", longestWord(newWords))
def readFile(filename):
'''Function that reads a text file, then prints the name of file without
'.txt'. The fuction returns the read file for main() to call, and print's
the file's name so the user knows which file is read'''
inFile1 = open(filename, "r")
fileContentsList = inFile1.read()
inFile1.close()
print("\n", filename.replace(".txt", "") + ":")
return fileContentsList
def numCharacters(file):
'''Fucntion returns the length of the READ file (not readlines because it
would only read the amount of lines and counting characters would be wrong),
which will be the correct amount of total characters in the text file.'''
return len(file)
def numSentances(file):
'''Function returns the occurances of a period, exclamation point, or
a question mark, thus counting the amount of full sentances in the text file.'''
return file.count(".") + file.count("!") + file.count("?")
def cleanUpWords(file):
words = (file.replace("-", " ").replace(" ", " ").replace("\n", " "))
onlyAlpha = ""
for i in words:
if i.isalpha() or i == " ":
onlyAlpha += i
return onlyAlpha.replace(" ", " ")
def numWords(newWords):
'''Function finds the amount of words in the text file by returning
the length of the cleaned up version of words from cleanUpWords().'''
return len(newWords.split())
def uniqueWords(newWords):
unique = sorted(newWords.split())
unique = set(unique)
return str(len(unique))
def longestWord(file):
max(file.split())
main()
So, my last two functions uniqueWords, and longestWord will not work properly, or at least my output is wrong. for unique words, i'm supposed to get 527, but i'm actually getting 567 for some odd reason. Also, my longest word function is always printing none, no matter what i do. I've tried many ways to get the longest word, the above is just one of those ways, but all return none. Please help me with my two sad functions!
Try to do it this way:
def longestWord(file):
return sorted(file.split(), key = len)[-1]
Or it would be even easier to do in uniqueWords
def uniqueWords(newWords):
unique = set(newWords.split())
return (str(len(unique)),max(unique, key=len))
info = uniqueWords("My name is Harper")
print("Unique words" + info[0])
print("Longest word" + info[1])
and you don't need sorted
before set
to get all unique words
because set it's an Unordered collections of unique elements
And look at cleanUpWords
. Because if you will have string like that Hello I'm Harper. Harper I am.
After cleaning it up you will get 6 unique words, because you will have word Im
.