Search code examples
java.nettext-segmentation

Finding first sentence in a paragraph


I have a string which basically contains a paragraph. There might be line breaks. Now I would want to get only the 1st sentence in the string. I thought I would try

indexOf(". ") 

that is a dot with a space.

The problem is that this won't work though on a line such as firstName. LastName.

I'm using .Net. Is there a good method available to achieve this? Im also tagging Java to see if I can narrow down my search.


Solution

  • What you need is a Natural Language Parsing (NLP) toolkit. It's very hard to write one yourself, as it requires a lot of research and data collection, but luckily it has already been done for you.

    .NET

    SharpNLP is a collection of natural language processing tools written in C#. Currently it provides the following NLP tools:

    • a sentence splitter
    • ...

    Java