I would like to tokenize a list of sentence, but keep negated verbs as unique words.
t = """As aren't good. Bs are good"""
print(word_tokenize(t))
['As', 'are', "n't", 'good', '.', 'Bs', 'are', 'good']
I would like to have "aren't" and "are" separate. With word_tokenize I get "n't". Same for other negated forms like (Couldn't, didn't, et).
How can I do it? Thanks in advance
If you want to extract individual words from a space-separated sentence, use Python's split()
method.
t = "As aren't good. Bs are good"
print (t.split())
['As', "aren't", 'good.', 'Bs', 'are', 'good']
You can specify other delimiters in the split()
method as well. For example, if you wanted to tokenize your string based on a full-stop, you could do something like this:
print (t.split("."))
["As aren't good", ' Bs are good']
Read the documentation here.