Search code examples
iphoneobjective-cnsscanner

NSScanner Loop Question


I have an NSScanner object that scans through HTML documents for paragraph tags. It seems like the scanner stops at the first result it finds, but I need all the results in an array.

How can my code be improved to go through an entire document?

- (NSArray *)getParagraphs:(NSString *) html 
{
    NSScanner *theScanner;
    NSString *text = nil;

    theScanner = [NSScanner scannerWithString: html];

    NSMutableArray*paragraphs = [[NSMutableArray alloc] init];

    // find start of tag
    [theScanner scanUpToString: @"<p>" intoString: NULL];
    if ([theScanner isAtEnd] == NO) {
        NSInteger newLoc = [theScanner scanLocation] + 10;
        [theScanner setScanLocation: newLoc];

        // find end of tag
        [theScanner scanUpToString: @"</p>" intoString: &text];

        [paragraphs addObject:text];
    }

    return text;
}

Solution

  • Disclaimer: To parse HTML, it's better to use a HTML parser like libxml's HTML 4 parser, especially to deal with arbitrary possibly malformed HTML. Anyway, since the question asks how to improve existing code using NSParser, I provide the following example. This will work in most cases but there are some corner cases where it won't. For seriuos HTML parsing, use a HTML parser.


    Iterate until the scanner has exhausted all characters:

    NSScanner* scanner = [NSScanner scannerWithString:html];
    NSMutableArray *paragraphs = [[NSMutableArray alloc] init];
    [scanner scanUpToString:@"<p" intoString:nil];
    while (![scanner isAtEnd]) {
        [scanner scanUpToString:@">" intoString:nil];
        [scanner scanString:@">" intoString:nil];
        NSString * text = nil;
        [scanner scanUpToString:@"</p>" intoString:&text];
        if (text) { // if html contains empty paragraphs <p></p>, text could be nil
            [paragraphs addObject:text];
        }
        [scanner scanUpToString:@"<p" intoString:nil];
    }
    ...
    [paragraphs release];