I am trying to split a passage of text into sentences with the delimiters (period, semicolon, and quesiton marks). I can think of using the split() method in java and save the resulting arrays into an ArrayList:
String[] sentencesByPeriod = passage.split("\\.");
String[] sentencesBySemicolon = passage.split("\\;");
String[] sentencesByQuestionM = passage.split("\\?");
List<String> allSentences = new ArrayList<String>();
allSentences.addAll(Arrays.asList(sentencesByPeriod));
allSentences.addAll(Arrays.asList(sentencesBySemicolon));
allSentences.addAll(Arrays.asList(sentencesByQuestionM));
This works, but I am wondering if there's any more efficient way to do this? Thanks
In one regex you can do like this.
String s[] = passage.split("[.;?]");
List<String> allSentences = new ArrayList<String>();
allSenteces.addAll(Arrays.asList(s));
Or use an StringTokenizer
StringTokenizer tokenizer = new StringTokenizer(passage, ".;?");
List<String> s = new ArrayList<String>();
while(tokenizer.hasMoreTokens()){
s.add(tokenizer.nextToken());
}