Search code examples
objective-cios9.3nsscanner

Why is NSScanner not finding the 1st occurrence of the target string?


I have a Obj-c app that was working; now it isn't. This is the portion of the string that I am trying to parse:

<div class="PostContent"><div class="Article"><div class="Post"><div class="PostContent"> <div><img style="background-image: url('http://cdn.openisbn.com/images/no_book_cover.jpg');border: solid 1px #383c40; " src=/cover/0345377443_220.jpg width=220 border=0 title="Women Who Run With The Wolves: Myths And Stories Of The Wild Woman Archetype"></div>Authors: <a href="/author/Clarissa_Pinkola_Estes/">Clarissa Pinkola Estes</a><BR>Publisher: <a href="/publisher/Ballantine_Books/">Ballantine Books</a>

Several thousand characters later, this text appears:

<div class="block" id="LayoutColumn_3"><div class="blockTop"></div><h2</h2><div align="center"><a href="/isbn/006251380X/" ><img style="padding:1px;border:1px solid #6c6c6c; background-image: url('http://cdn.openisbn.com/images/no_book_cover.jpg');" src=/cover/006251380X_72.jpg width=72 height=114 border=0 title="The Faithful Gardener: A Wise Tale About That Which Can Never Die"></a><BR><a href="/isbn/006251380X/" >The Faithful Gardener: A Wise Tale About That Which Can Never Die</a><BR><a href="/isbn/1604076356/" ><img style="padding:1px;border:1px solid #6c6c6c; background-image: url('http://cdn.openisbn.com/images/no_book_cover.jpg');"

This is my code to find the title:

[scanner setScanLocation:0];
[scanner setCaseSensitive:NO];
[scanner scanUpToString:@" border=0 title=\"" intoString:nil];  //  title
scanner.scanLocation += 17;
[scanner scanUpToString:@"\">" intoString:&tempString];
oTitle.text = tempString;

What's happening is it is skipping the first occurrence (Women Who Run...) of the target string and finds the second occurrence (The Faithful Gardner) and returns it rather than the first. Since this used to work, and I haven't changed the code, can someone tell me why this is not working and possibly suggest some changes to the code to get it working again? I would really appreciate it!


Solution

  • The reason it's not finding the first occurrence is that that particular instance appears to have two spaces between border=0 and title="...":

    <img style="..." src=... width=220  border=0  title="Women Who Run With ...">
    

    Your scanner is looking for a string with only a single space.


    Personally, I would suggest considering using an HTML parser. It's a little daunting the first time you use it, but it's an extremely powerful and flexible way of parsing HTML, and gets you out of the weeds of character-by-character scanning of the input. It's designed for precisely this sort of problem. See TFHpple or the Ray Wenderlich tutorial on how to parse HTML.