The program correctly identifies the words regardless of punctuation. I am having trouble integrate this into spam_indicator(text).
def spam_indicator(text):
for char in string.punctuation:
text = text.replace(char, '')
return word
for word in text:
if word.lower() not in words:
if word.lower() in SPAM_WORDS:
return float("{:.2f}".format(s/w))
The second block is wrong. I am trying to remove punctuations to run the function.
Try removing the punctuation first, then split the text into words.
def spam_indicator(text):
for char in string.punctuation:
text = text.replace(char, ' ') # N.B. replace with ' ', not ''
text = text.split()
w = 0
s = 0
words = []
for word in text:
if word.lower() not in words:
if word.lower() in SPAM_WORDS:
return float("{:.2f}".format(s/w))
There are many improvements that could be made to your code.
rather than a list. Since a set can not contain duplicates you don't need to check whether you've already seen the word before adding it to the set.str.translate()
to remove the punctuation. You want to replace punctuation with whitespace so that the split()
will split the text into words.round()
instead of converting to a string then to a float.Here is an example:
import string
def spam_indicator(text):
trans_table = {ord(c): ' ' for c in string.punctuation}
text = text.translate(trans_table).lower()
text = text.split()
word_count = 0
spam_count = 0
words = set()
for word in text:
if word not in SPAM_WORDS:
word_count += 1
spam_count += 1
return round(spam_count / word_count, 2)
You need to take care not to divide by 0 if there are no non-spam words. Anyway, I'm not sure what you want as the spam indicator value. Perhaps it should be the number of spam words divided by the total number of words (both spam and non-spam) to make it a value between 0 and 1?