Search code examples
objective-ctokenizeparsekit

How do I customize PKTokenizer in ParseKit to tokenize substrings?


Say I want to parse substrings with ParseKit, like the prefix of a word. So for example I want to parse 'preview' and 'review'. So my grammar might be:

@start  = prefix 'view';
prefix = 'pre' | 're';

Now without modifying ParseKit I can match 'pre view' and 're view' but not 'preview' or 'review'. From looking at the documentation I guess I need to customize PKTokeinzer's word state because it is looking for whitespace to terminate a 'Word' token. How do I get around that?


Solution

  • Developer of ParseKit here.

    I'm not sure I fully understand the question, but I think it sounds somewhat misguided.

    If you are looking for a way to match sub-tokens or characters, Regular Expressions might be a better fit for your needs than ParseKit.

    A ParseKit grammar matches against tokens produced by the ParseKit tokenizer (PKTokenizer class). Not individual characters.

    It's not that it is impossible for PKTokenizer to produce a pre and and a view token from input of preview. But it would require customization of the code that I would call unwise and unnecessarily complicated. I think that is a bad idea.

    If you want to use ParseKit (rather than Regex) anyway, you can simply do the sub-parsing in your assembler callbacks (instead of in the grammar).

    So in the Grammar:

    @start = either;
    either = 'preview' | 'review';
    

    And in ObjC:

    - (void)parser:(PKParser *)p didMatchEither:(PKAssembly *)a {
        PKToken *tok = [a pop];
        NSString *str = tok.stringValue;
    
        if ([str hasPrefix:@"pre"]) {
            ... // handle 'preview'
        } else {
            ... // handle 'review'
        }
    }
    

    Also remember that ParseKit Grammars support matching tokens via RegEx. So if you want to match all words that end in view:

    @start = anyView;
    anyView = /\b\w*?view\b/;
    

    Hope that helps.