Search code examples
objective-ciosios5ios4

Separate Full Sentences in a block of NSString text


I have been trying to use Regular Expression to separate full sentences in a big block of text. I can't use the componentsSeparatedByCharactersInSet because it will obviously fail with sentences ending in ?!, !!, ... I have seen some external classes to do componentSeparateByRegEx but I prefer doing it without adding an external library.

Here is a sample input Hi, I am testing. How are you? Wow!! this is the best, and I am happy.

The output should be an array

first element: Hi, I am testing.

second element: How are you?

third element: wow!!

forth element: this is the best, and I am happy.

This is what I have but as I mentioned it shouldn't do what I intend. Probably a regular expression will do a much better job here.

-(NSArray *)getArrayOfFullSentencesFromBlockOfText:(NSString *)textBlock{
    NSMutableCharacterSet *characterSet = [[NSMutableCharacterSet alloc] init];
    [characterSet addCharactersInString:@".?!"];  
    NSArray * sentenceArray = [textBlock componentsSeparatedByCharactersInSet:characterSet];                                   
    return sentenceArray;  
}

Thanks for your help,


Solution

  • You want to use -[NSString enumerateSubstringsInRange:options:usingBlock:] with the NSStringEnumerationBySentences option. This will give you every sentence, and it does so in a language-aware manner.

    NSArray *fullSentencesFromText(NSString *text) {
        NSMutableArray *results = [NSMutableArray array];
        [text enumerateSubstringsInRange:NSMakeRange(0, [text length]) options:NSStringEnumerationBySentences usingBlock:^(NSString *substring, NSRange substringRange, NSRange enclosingRange, BOOL *stop) {
            [results addObject:substring];
        }];
        return results;
    }
    

    Note, in testing, each substring appears to contain the trailing spaces after the punctuation. You may want to strip those out.