Search code examples
pythonnlptokenize

How to tokenize sentence using nlp


I'm new in NLP. I'm trying to tokenize sentence using nlp on python 3.7.So I used following code

import nltk
text4="This is the first sentence.A gallon of milk in the U.S. cost 
$2.99.Is this the third sentence?Yes,it is!"
x=nltk.sent_tokenize(text4)
x[0]

I was expecting that x[0] will return first sentence but I got

Out[4]: 'This is the first sentence.A gallon of milk in the U.S. cost $2.99.Is this the third sentence?Yes,it is!'

Am I doing anything wrong?


Solution

  • You need valid spacing and punctuation in your sentences for the tokenizer to behave properly:

    import nltk
    
    text4 = "This is a sentence. This is another sentence."
    nltk.sent_tokenize(text4)
    
    # ['This is a sentence.', 'This is another sentence.']
    
    ## Versus What you had before
    
    nltk.sent_tokenize("This is a sentence.This is another sentence.")
    
    # ['This is a sentence.This is another sentence.']