I am trying to build a grammar that will match on substrings of a word and am not having much. luck. I.e. I try to match on the text 'an' which succeeds, but it fails to match on the first two letters of 'and'
expr = phrase*;
phrase = an|text;
an = 'an'
text = Any;
I realize this is a basic example.
Creator of PEGKit here.
First, I want to say that from this brief description, I suspect that PEGKit may not be the best tool for this job.
PEGKit excels at matching at the token level, but is less useful for matching at the sub-token (character) level.
If you need to do a lot of purely sub-token matching as described here, Regular Expressions will be a much better solution, and you should use them instead of PEGKit.
However, if you need to check a few sub-token patterns in the context of a larger token-parsing problem, then yes, PEGKit can certainly accomplish that.
So to answer your specific question:
For this kind of sub-token matching in PEGKit, you should use a Semantic Predicate.
Semantic Predicates are described in the brief docs in the PEGKit readme. And here is a previous question on Stack Overflow related to the use of Semantic Predicates.
Semantic Predicates are Objective-C expressions embedded directly in your PEGKit grammars, which return a boolean value to indicate matching should succeed or fail. They are wrapped in a { ... }?
construct.
In this case, you could use a Semantic Predicate to match the "prefix" of a matched Word
token:
expr = anPhrase*;
anPhrase = { [LS(1) hasPrefix:@"an"] }? Word;
Here, anPhrase
will only match Word
tokens which start with an
.
The LS(1)
macro (also described in the PEGKit readme) means "Lookahead String 1". It will fetch the string value of the first lookahead token as an NSString
.