Search code examples
objective-cparsekit

parsekit gives unexpected calls to selectors


I have the following very simple (test) grammar file

@start = expression+;
expression = keyword | otherWord;
otherWord = Word;
keyword = a | the;
a = 'a';
the = 'the';

Then I run the following code:

// Grammar contains the contents of the above grammar file.
PKParser *parser = [[PKParserFactory factory] parserFromGrammar:grammar assembler:self];
NSString *s = @"The parrot";
[parser parse:s];
PKReleaseSubparserTree(parser);

And the following methods:

- (void)didMatchA:(PKAssembly *)a{
    [self log:a type:@"didMatchA          "];
}
- (void)didMatchThe:(PKAssembly *)a{
    [self log:a type:@"didMatchThe        "];
}
- (void)didMatchKeyword:(PKAssembly *)a{
    [self log:a type:@"didMatchKeyword    "];
}
- (void)didMatchExpression:(PKAssembly *)a{
    [self log:a type:@"didMatchExpression "];
}
- (void)didMatchOtherWord:(PKAssembly *)a{
    [self log:a type:@"didMatchOtherWord  "];
}

-(void) log:(PKAssembly *) assembly type:(NSString *) type{
    PKToken * token = [assembly top];
    NSLog(@"Method: [%@], token: %@, assembly: %@", type, token, assembly);
}

And finally I get these messages in the log:

[1] Method: [didMatchThe        ], token: The, assembly: [The]The^parrot
[2] Method: [didMatchKeyword    ], token: The, assembly: [The]The^parrot
[3] Method: [didMatchOtherWord  ], token: The, assembly: [The]The^parrot
[4] Method: [didMatchExpression ], token: The, assembly: [The]The^parrot
[5] Method: [didMatchExpression ], token: The, assembly: [The]The^parrot
[6] Method: [didMatchOtherWord  ], token: parrot, assembly: [The, parrot]The/parrot^
[7] Method: [didMatchExpression ], token: parrot, assembly: [The, parrot]The/parrot^

This sort of makes sense, but I cannot see why %5 occurs. I'd really like to be able to remove the double matching so that keywords such as "The" only trigger didMatchThe and not didMatchKeyword.

Unfortunately the doco on parsekit seems to be non-existant on its grammar syntax and how it decides to trigger methods. Yes, I've trolled the source code too :-)

Has anyone got experience with parsekit and can shed some light on this?


Solution

  • I'm the developer of ParseKit, and this is actually correct behavior. Here's a few items to help clear this up:

    1. The best way to learn about how ParseKit works is to buy "Building Parsers with Java" by Steven John Metsker. ParseKit is based almost entirely on the designs laid out there.

    2. ParseKit's parser component is extremely dynamic and features Infinite look-ahead. This makes it ideal for quick development or easily parsing small input, but it also means ParseKit exhibits extremely poor performance when parsing large documents.

    3. Due to ParseKit's infinite look-ahead, the assembler methods you implement will be called many times. Actually, it will appear they will be called too many times as you've described above. This is normal. ParseKit is exploring every possible parse path available to it at any time, so you get "too many" callbacks.

    4. The answer is to never work on ivars in your assembler callback methods. In your Assembler methods, you should instead always keep the state of what you are working on in the current PKAssembly's target ivar.

      a.target

    The current PKAssembly is the one passed into your callback method.

    Hope that helps.