How do I customize PKTokenizer in ParseKit to tokenize substrings?

Say I want to parse substrings with ParseKit, like the prefix of a word. So for example I want to parse 'preview' and 'review'. So my grammar might be:

@start  = prefix 'view';
prefix = 'pre' | 're';

Now without modifying ParseKit I can match 'pre view' and 're view' but not 'preview' or 'review'. From looking at the documentation I guess I need to customize PKTokeinzer's word state because it is looking for whitespace to terminate a 'Word' token. How do I get around that?

Solution

Developer of ParseKit here.

I'm not sure I fully understand the question, but I think it sounds somewhat misguided.

If you are looking for a way to match sub-tokens or characters, Regular Expressions might be a better fit for your needs than ParseKit.

A ParseKit grammar matches against tokens produced by the ParseKit tokenizer (PKTokenizer class). Not individual characters.

It's not that it is impossible for PKTokenizer to produce a pre and and a view token from input of preview. But it would require customization of the code that I would call unwise and unnecessarily complicated. I think that is a bad idea.

If you want to use ParseKit (rather than Regex) anyway, you can simply do the sub-parsing in your assembler callbacks (instead of in the grammar).

So in the Grammar:

@start = either;
either = 'preview' | 'review';

And in ObjC:

- (void)parser:(PKParser *)p didMatchEither:(PKAssembly *)a {
    PKToken *tok = [a pop];
    NSString *str = tok.stringValue;

    if ([str hasPrefix:@"pre"]) {
        ... // handle 'preview'
    } else {
        ... // handle 'review'
    }
}

Also remember that ParseKit Grammars support matching tokens via RegEx. So if you want to match all words that end in view:

@start = anyView;
anyView = /\b\w*?view\b/;

Hope that helps.