Search code examples
javasplitsentence

split a passage into sentences, with delimiters: period, semicolon, and question marks


I am trying to split a passage of text into sentences with the delimiters (period, semicolon, and quesiton marks). I can think of using the split() method in java and save the resulting arrays into an ArrayList:

String[] sentencesByPeriod = passage.split("\\.");
String[] sentencesBySemicolon = passage.split("\\;");
String[] sentencesByQuestionM = passage.split("\\?");

List<String> allSentences = new ArrayList<String>();
allSentences.addAll(Arrays.asList(sentencesByPeriod));
allSentences.addAll(Arrays.asList(sentencesBySemicolon));
allSentences.addAll(Arrays.asList(sentencesByQuestionM));

This works, but I am wondering if there's any more efficient way to do this? Thanks


Solution

  • In one regex you can do like this.

    String s[] = passage.split("[.;?]");
    List<String> allSentences = new ArrayList<String>();
    allSenteces.addAll(Arrays.asList(s));
    

    Or use an StringTokenizer

     StringTokenizer tokenizer = new StringTokenizer(passage, ".;?");
     List<String> s = new ArrayList<String>();
     while(tokenizer.hasMoreTokens()){
      s.add(tokenizer.nextToken());
     }