I think what I want to do is a fairly common task but I've found no reference on the web. I have text with punctuation, and I want a list of the words.
"Hey, you - what are you doing here!?"
should be
['hey', 'you', 'what', 'are', 'you', 'doing', 'here']
But Python's str.split()
only works with one argument, so I have all words with the punctuation after I split with whitespace. Any ideas?
A case where regular expressions are justified:
import re
DATA = "Hey, you - what are you doing here!?"
print re.findall(r"[\w']+", DATA)
# Prints ['Hey', 'you', 'what', 'are', 'you', 'doing', 'here']