Search code examples
pythonnlpnltktokenize

What are the cases where NLTK's word_tokenize differs from str.split()?


Is there documentation where I can find all the possible cases where word_tokenize is different/better than simply splitting by whitespace? If not, could a semi-thorough list be given?


Solution

  • Word_tokenize documentation: https://www.kite.com/python/docs/nltk.word_tokenize

    The NLTK tokenize package documentation: https://www.nltk.org/api/nltk.tokenize.html