Search code examples
objective-ccocoaparsingperformancensscanner

Nested NSScanner Efficiency


Is running a nested NSScanner the most efficient method for parsing out a string of repeating elements or can the scanning be done in one pass?

I have a string which is returned from a command line call (NSTAsk) to Apple's Compressor (there are no line breaks, breaks are in purely for ease of this question being legible without scrolling):

<jobStatus name="compressor.motn" submissionTime="12/4/10 3:56:16 PM"
 sentBy="localuser" jobType="Compressor" priority="HighPriority" 
 timeElapsed="32 second(s)" timeRemaining="0" timeElapsedSeconds="32"
 timeRemainingSeconds="0" percentComplete="100" resumePercentComplete="100"
 status="Successful" jobid="CD4046D8-CDC1-4F2D-B9A8-460DF6AF184E" 
 batchid="0C9041F5-A499-4D00-A26A-D7508EAF3F85" /jobStatus>

These repeat in the same string thus there could be zero through n of these in the return string:

<jobstatus .... /jobstatus><jobstatus .... /jobstatus>
<jobstatus .... /jobstatus>

In addition there could be other tags included which are of no significance to my code (batchstatus in this example):

<jobstatus .... /jobstatus><batchstatus .... /batchstatus>
<jobstatus .... /jobstatus>

This is NOT an XML document that gets returned, merely a series of blocks of status which happen to be wrapped in an XML like tag. None of the blocks are nested. They are all sequential in nature. I have no control over the data being returned.

My goal (and currently working code) parses the string into "jobs" that contain dictionaries of the details within a jobstatus block. Any other blocks (such as batchstatus) and any other strings are ignored. I am only concerned with the contents of the jobstatus blocks.

NSScanner * jobScanner = [NSScanner scannerWithString:dataAsString];
NSScanner * detailScanner = nil;

NSMutableDictionary * jobDictionary = [NSMutableDictionary dictionary];
NSMutableArray * jobsArray = [NSMutableArray array];

NSString * key = @"";
NSString * value = @"";

NSString * jobStatus = @"";

NSCharacterSet * whitespace = [NSCharacterSet whitespaceCharacterSet];

while ([jobScanner isAtEnd] == NO) {

    if ([jobScanner scanUpToString:@"<jobstatus " intoString:NULL] &&
        [jobScanner scanUpToCharactersFromSet:whitespace intoString:NULL] &&
        [jobScanner scanUpToString:@" /jobstatus>" intoString:&jobStatus]) {

        detailScanner = [NSScanner scannerWithString:jobStatus];

        [jobDictionary removeAllObjects];

        while ([detailScanner isAtEnd] == NO) {

            if ([detailScanner scanUpToString:@"=" intoString:&key] &&
                [detailScanner scanString:@"=\"" intoString:NULL] &&
                [detailScanner scanUpToString:@"\"" intoString:&value] &&
                [detailScanner scanString:@"\"" intoString:NULL]) {

                [jobDictionary setObject:value forKey:key];

                //NSLog(@"Key:(%@) Value:(%@)", key, value);
            }
        }

        [jobsArray addObject:
         [NSDictionary dictionaryWithDictionary:jobDictionary]];
    }

}

NSLog(@"Jobs Dictionary:%@", jobsArray);

The above code produces the following log output:

Jobs Dictionary:(
    {
    batchid = "0C9041F5-A499-4D00-A26A-D7508EAF3F85";
    jobType = Compressor;
    jobid = "CD4046D8-CDC1-4F2D-B9A8-460DF6AF184E";
    name = "compressor.motn";
    percentComplete = 100;
    priority = HighPriority;
    resumePercentComplete = 100;
    sentBy = localuser;
    status = Successful;
    submissionTime = "12/4/10 3:56:16 PM";
    timeElapsed = "32 second(s)";
    timeElapsedSeconds = 32;
    timeRemaining = 0;
    timeRemainingSeconds = 0;
}

Here's the concern. In my code I am scanning through the string and then when I get a block of data, scanning through that piece to create a dictionary that populates an array. This effectively means the string gets walked twice. As this is something that happens every 15 - 30 seconds or so and could contain hundreds of jobs, I see this as a potential CPU and memory hog and being as the app running this could be on the same machine as the Compressor app (which is already a memory and CPU hog) - I don't want to add any burden if I don't have to.

Is there a better way that I should be using NSScanner as I walk through it to get the data?

Any advice or recommendation much appreciated!


Solution

  • Your nesting is all right in that you're constructing detailScanner with jobStatus that jobScanner scanned. That's not a problem. You have two others, though. One is that you're sweating whitespace characters too much, but worse than that, your outermost loop is never going to exit because of the way your initial if conditional is formed.

    Change

    if ([jobScanner scanUpToString:@"<jobstatus " intoString:NULL] &&
    [jobScanner scanUpToCharactersFromSet:whitespace intoString:NULL] &&
    [jobScanner scanUpToString:@" /jobstatus>" intoString:&jobStatus])
    

    to

    if ([jobScanner scanString:@"<jobstatus" intoString:NULL] && 
    [jobScanner scanUpToString:@"/jobstatus>" intoString:&jobStatus] && 
    [jobScanner scanString:@"/jobstatus>" intoString:NULL])
    

    Of course, you can remove your line in which you cache your whitespace character set. You don't need to scan whitespace characters and you don't need to include them in the strings you scan or scan up to. By default, scanners skip whitespace characters. Uncommenting your first NSLog statement bears this out; there aren't any stray spaces anyplace in the output.

    But you do need, once you've scanned up to a given string, to scan that string itself or you're not going to move forward toward the end for your next iteration.

    Other than that, I think your approach is sound.