Search code examples
javascriptiphoneiosobjective-cuiwebview

How to get all <img src> of a web page in iOS UIWebView?


everyone.

I'm trying to get all image urls of the current page in UIWebView.

So, here is my code.

- (void)webViewDidFinishLoad:(UIWebView*)webView {
    NSString *firstImageUrl = [self.webView stringByEvaluatingJavaScriptFromString:@"var images = document.getElementsByTagName('img');images[0].src.toString();"];
    NSString *imageUrls = [self.webView stringByEvaluatingJavaScriptFromString:@"var images= document.getElementsByTagName('img');var imageUrls = "";for(var i = 0; i < images.length; i++){var image = images[i];imageUrls += image.src;imageUrls += \\’,\\’;}imageUrls.toString();"];
    NSLog(@"firstUrl : %@", firstImageUrl);
    NSLog(@"images : %@",imageUrls);
}

1st NSLog returns correct image's src, but 2nd NSLog returns nothing.

2013-01-25 00:51:23.253 WebDemo[3416:907] firstUrl: https://www.paypalobjects.com/en_US/i/scr/pixel.gif
2013-01-25 00:51:23.254 WebDemo[3416:907] images :

I don't know why. Please help me...

Thanks.


Solution

  • Perrohunter pointed out one NSRegularExpression solution, which is great. If you don't want to enumerate the array of matches, you can use the block-based enumerateMatchesInString method, too:

    NSError *error = NULL;
    NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:@"(<img\\s[\\s\\S]*?src\\s*?=\\s*?['\"](.*?)['\"][\\s\\S]*?>)+?"
                                                                           options:NSRegularExpressionCaseInsensitive
                                                                             error:&error];
    
    [regex enumerateMatchesInString:yourHTMLSourceCodeString
                            options:0
                              range:NSMakeRange(0, [yourHTMLSourceCodeString length])
                         usingBlock:^(NSTextCheckingResult *result, NSMatchingFlags flags, BOOL *stop) {
    
                             NSString *img = [yourHTMLSourceCodeString substringWithRange:[result rangeAtIndex:2]];
                             NSLog(@"img src %@",img);
                         }];
    

    I've also updated the regex pattern to deal with the following issues:

    • there can be attributes between the start img tag and the src attribute;
    • there can be attributes after the src attribute and before the >;
    • there can be newline characters in the middle of an img tag (the . captures everything except newline character);
    • the src attribute value can be quoted with ' as well as "; and
    • there can be spaces between src and the = as well as between the = and the subsequent value.

    I freely recognize that reading regex patterns is painful for the uninitiated, and perhaps other solutions might make more sense (the JSON suggestion by Joris, using scanners, etc.). But if you wanted to use regex, the above pattern might cover a few more permutations of the img tag, and enumerateMatchesInString might be ever so slightly more efficient than matchesInString.