Search code examples
cocoansscanner

What is the benefit of NSScanner's charactersToBeSkipped?


I have the string @" ILL WILL KILLS ", and I'm using NSScanner's scanUpToString:intoString: to find every occurrence of "ILL". If it's accurate, it will NSLog 4, 9, and 14.

My string begins with 4 spaces, which I realize are members of the NSScanner's default charactersToBeSkipped NSCharacterSet. If I set charactersToBeSkipped to nil, as in the example below, then this code accurately finds the 3 occurrences of "ILL".

NSScanner* scanner = [NSScanner scannerWithString:@"    ILL WILL KILLS    "] ;
scanner.charactersToBeSkipped = nil ;
NSString* scannedCharacters ;
while ( TRUE ) {
    BOOL didScanUnignoredCharacters = [scanner scanUpToString:@"ILL" intoString:&scannedCharacters] ;
    if ( scanner.isAtEnd ) {
        break ;
    }
    NSLog(@"Found match at index:  %tu", scanner.scanLocation) ;
    // Since stopString "ILL" is 3 characters long, advance scanLocation by 3 to find the next "ILL".
    scanner.scanLocation += 3 ;
}

However, if I don't nullify the default charactersToBeSkipped, here's what happens:

  • scanner is initialized with scanLocation == 0.
  • scanUpToString executes for the 1st time, it "looks past" 4 empty spaces and "sees" ILL at index 4, so it immediately stops. scanLocation is still 0.
  • I believe that I found a match, and I increment scanLocation by 3.
  • scanUpToString executes for the 2nd time, it "looks past" 1 empty space and "sees" ILL at index 4, so it immediately stops. scanLocation is still 3.

To me, it's a design flaw that scanner stopped at scanLocation == 0 the first time, since I expected it to stop at scanLocation == 4. If you believe that the above code can be rewritten to accurately NSLog 4, 9, and 14 without settings charactersToBeSkipped to nil, then please, show me how. For now, my opinion is that charactersToBeSkipped exists solely to make NSScanners more difficult to use.


Solution

  • For now, my opinion is that charactersToBeSkipped exists solely to make NSScanners more difficult to use.

    Then you aren't very imaginative. The "benefit" of charactersToBeSkipped is to… wait for it… skip characters. For example, if you have a string like @" 8 9 10 ", you can scan those three integers using -scanInt: three times. You don't have to care about the precise amount of whitespace that separates them.

    Given the task you describe, where you're just looking for instances of a string within a string, NSScanner is probably not the right tool. You probably want to use -[NSString rangeOfString:options:range:].

    The docs for -scanUpToString:intoString: are fairly clear. If stopString is the first string in the receiver (taking into account that charactersToBeSkipped will be skipped), then the method returns NO, meaning it didn't scan anything. Consequently, the scan location won't be changed.

    The return value indicates success or failure. If the stop string is next (ignoring characters to be skipped), then there's nothing to scan "up to" the stop string; the scanner is already at the stop string, so the method fails.