I must parse big html text files and extract substrings which match a certain pattern. For example:
<span id='report-9429'>Report for May 2009</span>
A lot of code and text.
<span id='report-10522'>Report for Apr 2009</span>
A lot of code and text.
<span id='report-15212'>Report for Apr 2009</span>
Where 9429, 10522 and 15212 are the parts which I must get as array of substrings. The file contains many of these, and I need to get all of them.
Is there some sort of RegExp feature in Cocoa? And how would such a RegExp look like?
You might use NSRegularExpression (though apparently it doesn't work on Snow Leo) or RegexKit.
Your regex might look like this:
<span id='report-(\d+)'>Report for \w+ \d+</span>
For NSRegularExpression, the code might look like this:
NSString *pattern = @"<span id='report-(\d+)'>Report for \w+ \d+</span>";
NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:pattern
options:0
error:nil];
[regex enumerateMatchesInString:string
options:0
range:NSMakeRange(0, [string length])
usingBlock:^(NSTextCheckingResult *result, NSMatchingFlags flags, BOOL *stop) {
NSString *reportId = [string substringWithRange:[result rangeAtIndex:1]];
// Do something with reportId
}];