PEGKit combine matched symbols on stack

I'm writing a grammar for PEGKit to parse a Twine exported Twee file. This is my first time using PEGKit and I'm trying to get to grips with how it works.

I have this twee source file that I'm parsing

:: Passage One
P1 Line One
P1 Line Two

:: Passage Two
P2 Line One
P2 Line Two

Currently I've worked out how to parse the above using the following grammar

@before {
    PKTokenizer *t = self.tokenizer;
    [t.symbolState add:@"::"];
    [t.commentState addSingleLineStartMarker:@"::"];

    // New lines as symbols
    [t.whitespaceState setWhitespaceChars:NO from:'\n' to:'\n'];
    [t.whitespaceState setWhitespaceChars:NO from:'\r' to:'\r'];
    [t setTokenizerState:t.symbolState from:'\n' to:'\n'];
    [t setTokenizerState:t.symbolState from:'\r' to:'\r'];
}

start                   = passage+;
passage                 = passageTitle contentLine*;
passageTitle            = passageStart Word+ eol+;
contentLine             = singleLine eol+;
singleLine              = Word+;
passageStart            = '::'!;
eol                     = '\n'! | '\r'!;

and the result I get is

[Passage, One, P1, Line, One, P1, Line, Two, Passage, Two, P2, Line, One, P2, Line, Two]::/Passage/One/
/P1/Line/One/
/P1/Line/Two/
/
/::/Passage/Two/
/P2/Line/One/
/P2/Line/Two/
^

Ideally, I'd like the parser to combine the words matched for the passageTitle into a single string similar to how the built in PEGKit QuotedString grammar works. I would also like the words matched for a contentLine to be combined as well.

So, eventually, I would have this on the stack

[Passage One, P1 Line One, P1 Line Two, Passage Two, P2 Line One, P2 Line Two]

Any thoughts on how to achieve this would be appreciated.

Solution

Creator of PEGKit here.

I understand your ultimate strategy (to collect/combine lines as single string objects), and agree that it makes sense, however, I disagree with your proposed tactic to achieve that (to alter tokenization to try to combine what are essentially multiple separate tokens into single tokens).

Combining lines into convenient string objects makes sense, but altering tokenization to achieve that, doesn't make sense IMO (at least not with a recursive descent parsing kit PEGKit) when the lines in question don't have obvious 'bracketing' characters like quotes or brackets.

You could treat the passageTitle lines starting with :: as single-line Comment tokens, but I probably wouldn't since I gather they are semantically not comments.

So instead of merging multiple tokens via the tokenizer, you should merge multiple tokens in the more natural way for PEGKit: in the parser delegate callbacks.

We have two different cases to deal with here:

The passageTitle lines
The contentLine lines

In your grammar, remove this line so we won't be treating passageTitles as Comment tokens (you didn't have that completely correctly configured anyhow, but never mind that):

[t.commentState addSingleLineStartMarker:@"::"];

And also in your grammar, remove the ! from your passageStart rule so that those tokens won't be discarded:

passageStart            = '::';

That's all for the grammar. Now in your Parser Delegate callbacks, implement the two necessary callback methods for the title and content lines. And in each callback, pull all of the necessary tokens off the PKAssembly's stack, and merge them into a single string (in reverse).

@interface TweeDelegate : NSObject
@end

@implementation TweeDelegate

- (void)parser:(PKParser *)p didMatchPassageTitle:(PKAssembly *)a {
    NSArray *toks = [a objectsAbove:[PKToken tokenWithTokenType:PKTokenTypeSymbol stringValue:@"::" doubleValue:0.0]];
    [a pop]; // discard `::`

    NSMutableString *buf = [NSMutableString string];

    for (PKToken *tok in [toks reverseObjectEnumerator]) {
        [buf appendFormat:@"%@ ", tok.stringValue];
    }

    CFStringTrimWhitespace((CFMutableStringRef)buf);

    NSLog(@"Title: %@", buf); // Passage One
}

- (void)parser:(PKParser *)p didMatchContentLine:(PKAssembly *)a {
    NSArray *toks = [a objectsAbove:nil];

    NSMutableString *buf = [NSMutableString string];

    for (PKToken *tok in [toks reverseObjectEnumerator]) {
        [buf appendFormat:@"%@ ", tok.stringValue];
    }

    CFStringTrimWhitespace((CFMutableStringRef)buf);

    NSLog(@"Content: %@", buf); // P1 Line One
}

@end

I receive the following output:

Title: Passage One
Content: P1 Line One
Content: P1 Line Two
Title: Passage Two
Content: P2 Line One
Content: P2 Line Two

As for what to do with these strings once you have created them, I'll leave up to you :). Hope that helps.