Is there documentation where I can find all the possible cases where word_tokenize
is different/better than simply splitting by whitespace? If not, could a semi-thorough list be given?
Word_tokenize documentation: https://www.kite.com/python/docs/nltk.word_tokenize
The NLTK tokenize package documentation: https://www.nltk.org/api/nltk.tokenize.html