what is the code to split a sentence into a list of its constituent words AND punctuation? Most text preprocessing programs tend to remove punctuations.
For example, if I enter this:
"Punctuations to be included as its own unit."
The desired output would be:
result = ['Punctuations', 'to', 'be', 'included', 'as', 'its', 'own', 'unit', '.']
many thanks!
You might want to consider using a Natural Language Toolkit or nltk
.
Try this:
import nltk
sentence = "Punctuations to be included as its own unit."
tokens = nltk.word_tokenize(sentence)
print(tokens)
Output: ['Punctuations', 'to', 'be', 'included', 'as', 'its', 'own', 'unit', '.']