I'm trying to create a function that removes punctuation and lowercases every letter in a string. Then, it should return all this in the form of a dictionary that counts the word frequency in the string.
This is the code I wrote so far:
def word_dic(string):
string = string.lower()
new_string = string.split(' ')
result = {}
for key in new_string:
if key in result:
result[key] += 1
else:
result[key] = 1
for c in result:
"".join([ c if not c.isalpha() else "" for c in result])
return result
But this what i'm getting after executing it:
{'am': 3,
'god!': 1,
'god.': 1,
'i': 2,
'i?': 1,
'thanks': 1,
'to': 1,
'who': 2}
I just need to remove he punctuation at the end of the words.
"".join([ c if not c.isalpha() else "" for c in result])
creates a new string without the punctuation, but it doesn't do anything with it; it's thrown away immediately, because you never store the result.
Really, the best way to do this is to normalize your keys before counting them in result
. For example, you might do:
for key in new_string:
# Keep only the alphabetic parts of each key, and replace key for future use
key = "".join([c for c in key if c.isalpha()])
if key in result:
result[key] += 1
else:
result[key] = 1
Now result
never has keys with punctuation (and the counts for "god."
and "god!"
are summed under the key "god"
alone), and there is no need for another pass to strip the punctuation after the fact.
Alternatively, if you only care about leading and trailing punctuation on each word (so "it's"
should be preserved as is, not converted to "its"
), you can simplify a lot further. Simply import string
, then change:
key = "".join([c for c in key if c.isalpha()])
to:
key = key.rstrip(string.punctuation)
This matches what you specifically asked for in your question (remove punctuation at the end of words, but not at the beginning or embedded within the word).