Say I want to parse substrings with ParseKit, like the prefix of a word. So for example I want to parse 'preview' and 'review'. So my grammar might be:
@start = prefix 'view';
prefix = 'pre' | 're';
Now without modifying ParseKit I can match 'pre view' and 're view' but not 'preview' or 'review'. From looking at the documentation I guess I need to customize PKTokeinzer's word state because it is looking for whitespace to terminate a 'Word' token. How do I get around that?
Developer of ParseKit here.
I'm not sure I fully understand the question, but I think it sounds somewhat misguided.
If you are looking for a way to match sub-tokens or characters, Regular Expressions might be a better fit for your needs than ParseKit.
A ParseKit grammar matches against tokens produced by the ParseKit tokenizer (PKTokenizer
class). Not individual characters.
It's not that it is impossible for PKTokenizer
to produce a pre
and and a view
token from input of preview
. But it would require customization of the code that I would call unwise and unnecessarily complicated. I think that is a bad idea.
If you want to use ParseKit (rather than Regex) anyway, you can simply do the sub-parsing in your assembler callbacks (instead of in the grammar).
So in the Grammar:
@start = either;
either = 'preview' | 'review';
And in ObjC:
- (void)parser:(PKParser *)p didMatchEither:(PKAssembly *)a {
PKToken *tok = [a pop];
NSString *str = tok.stringValue;
if ([str hasPrefix:@"pre"]) {
... // handle 'preview'
} else {
... // handle 'review'
}
}
Also remember that ParseKit Grammars support matching tokens via RegEx. So if you want to match all words that end in view
:
@start = anyView;
anyView = /\b\w*?view\b/;
Hope that helps.