Search code examples
pythontokenize

How do I tokenize this input sentences by the following stopwords ("!", "?", ".", “,”)?


I imported sent_tokenize from nltk.tokenize and thus use the method called sent_tokenize(input). Below are the the solutions and function I've used to solve this but I keep on getting failed. I tried looking on the net and tried to understand every method used out there but still it's not working. I have no idea what I did wrong ? Please help.

from nltk.tokenize import sent_tokenize
def tokenise(input, expected_output):
  input = "Excuse me, where can I find a chicken rice shop?"
  expected_output = ['Excuse me', 'where can I find a chicken rice shop']
  result = sent_tokenize(input)
print('Pass' if result == expected_output else 'Failed!')

# Please make sure all test cases return 'Pass'
tokenise(tcase1, tans1)
tokenise(tcase2, tans2)
tokenise(tcase3, tans3)
print('Pass' if result == expected_output else 'Failed!')



Test case 1:
Input: Excuse me, where can I find a chicken rice shop? Expected output: ["Excuse me", "where can I find a chicken rice shop"]

Test case 2:
Input: OMG!!! It is Friday....where should we go for dinner? Expected output: ["OMG", "It is Friday", "where should we go for dinner"]

Test case 3:
Input: He’s nervous, but on the surface he looks calm and ready. Expected output: [“He’s nervous”, “but on the surface he looks calm and ready”]

Solution

  • Other than the overwriting of your function arguments inside the function and the variable naming issue, your code is working exactly as it should.

    sent_tokenize() splits full sentences. Ellipsis and commas are not punctuation that end sentences and so there will be no split for those characters.

    Your code, but working:

    from nltk.tokenize import sent_tokenize
    
    def tokenise(inp, expected_output):  
      res = sent_tokenize(inp)  
      print(res, 'Pass' if res == expected_output else 'Failed!')
    
    # Please make sure all test cases return 'Pass'
    #Test case 1:
    tokenise("Excuse me, where can I find a chicken rice shop?", ["Excuse me", "where can I find a chicken rice shop"])
    
    #Test case 2:
    tokenise("OMG!!! It is Friday....where should we go for dinner?",  ["OMG", "It is Friday", "where should we go for dinner"])
    
    #Test case 3:
    tokenise("He’s nervous, but on the surface he looks calm and ready",  ["He’s nervous", "but on the surface he looks calm and ready"])
    

    Each of these return Failed!! as I would expect they should.