I am new to NLP. Using Spacy and NLTK to count the sentences from JSON file but there is a big difference in both of the answers. I thought that the answers will be same. Anyone who can tell me that?? or any web link which will help me about this. Please I'm confused here
Sentence segmentation & tokenization are NLP subtasks, and each NLP library may have different implementations, leading to different error profiles.
Even within the spaCy library there are different approaches: the best results are obtained by using the dependency parser, but a more simple rule-based sentencizer
component also exists which is faster, but usually makes more mistakes (docs here).
Because no implementation will be 100% perfect, you will get discrepancies between different methods & different libraries. What you can do, is print the cases in which the methods disagree, inspect these manually, and get a feel of which of the approaches works best for your specific domain & type of texts.