NLTK tokenizes a quote sentence into two

Here is the code:

x = '"What do you mean?" asked Jack, looking down.'
nltk.tokenize.sent_tokenize(x)

Here is the output:

['"What do you mean?"', 'asked Jack, looking down.']

What I would like to get:

['"What do you mean?" asked Jack, looking down.']

I am not sure how to fix the issue, any help would be appreciated! Thanks!

Solution

You are using 'sent_tokenize()' which is creating sentences as tokens. And it observes '?' question-mark and '.' full-stop as end-of sentences, that is why it is creating 2 tokens from your given string.

Read about NLTK tokenizers here - https://www.nltk.org/api/nltk.tokenize.html

For your expected output, given the sentence in question, you may do-

x.split(',')