Search code examples
pythonstringsplit

Split Strings into words with multiple word boundary delimiters


I think what I want to do is a fairly common task but I've found no reference on the web. I have text with punctuation, and I want a list of the words.

"Hey, you - what are you doing here!?"

should be

['hey', 'you', 'what', 'are', 'you', 'doing', 'here']

But Python's str.split() only works with one argument, so I have all words with the punctuation after I split with whitespace. Any ideas?


Solution

  • A case where regular expressions are justified:

    import re
    DATA = "Hey, you - what are you doing here!?"
    print re.findall(r"[\w']+", DATA)
    # Prints ['Hey', 'you', 'what', 'are', 'you', 'doing', 'here']