Search code examples
cocoamacosnsstringosx-snow-leopard

How to extract a substring which matches a pattern?


I must parse big html text files and extract substrings which match a certain pattern. For example:

<span id='report-9429'>Report for May 2009</span>
A lot of code and text.
<span id='report-10522'>Report for Apr 2009</span>
A lot of code and text.
<span id='report-15212'>Report for Apr 2009</span>

Where 9429, 10522 and 15212 are the parts which I must get as array of substrings. The file contains many of these, and I need to get all of them.

Is there some sort of RegExp feature in Cocoa? And how would such a RegExp look like?


Solution

  • You might use NSRegularExpression (though apparently it doesn't work on Snow Leo) or RegexKit.

    Your regex might look like this:

    <span id='report-(\d+)'>Report for \w+ \d+</span>
    

    For NSRegularExpression, the code might look like this:

    NSString *pattern = @"<span id='report-(\d+)'>Report for \w+ \d+</span>";
    NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:pattern
                                                                           options:0
                                                                             error:nil];
    [regex enumerateMatchesInString:string
                            options:0
                              range:NSMakeRange(0, [string length])
                         usingBlock:^(NSTextCheckingResult *result, NSMatchingFlags flags, BOOL *stop) {
        NSString *reportId = [string substringWithRange:[result rangeAtIndex:1]];
        // Do something with reportId
    }];