Search code examples
iosobjective-cpegkit

Match substrings within a PEGKit grammar


I am trying to build a grammar that will match on substrings of a word and am not having much. luck. I.e. I try to match on the text 'an' which succeeds, but it fails to match on the first two letters of 'and'

expr = phrase*;
phrase = an|text;
an = 'an'
text = Any;

I realize this is a basic example.


Solution

  • Creator of PEGKit here.

    First, I want to say that from this brief description, I suspect that PEGKit may not be the best tool for this job.

    PEGKit excels at matching at the token level, but is less useful for matching at the sub-token (character) level.

    If you need to do a lot of purely sub-token matching as described here, Regular Expressions will be a much better solution, and you should use them instead of PEGKit.

    However, if you need to check a few sub-token patterns in the context of a larger token-parsing problem, then yes, PEGKit can certainly accomplish that.


    So to answer your specific question:

    For this kind of sub-token matching in PEGKit, you should use a Semantic Predicate.

    Semantic Predicates are described in the brief docs in the PEGKit readme. And here is a previous question on Stack Overflow related to the use of Semantic Predicates.

    Semantic Predicates are Objective-C expressions embedded directly in your PEGKit grammars, which return a boolean value to indicate matching should succeed or fail. They are wrapped in a { ... }? construct.

    In this case, you could use a Semantic Predicate to match the "prefix" of a matched Word token:

    expr = anPhrase*;
    anPhrase = { [LS(1) hasPrefix:@"an"] }? Word;
    

    Here, anPhrase will only match Word tokens which start with an.

    The LS(1) macro (also described in the PEGKit readme) means "Lookahead String 1". It will fetch the string value of the first lookahead token as an NSString.