Search code examples

PegKit string interpolation

I'm using PegKit to build a simple domain specific interpreted language.

I essentially have everything working other than interpolated strings.

The idea is to achieve some kind of rule like this:

atom = Number | stringLiteral | referenceType;
stringLiteral = "'"! (~"'" | "{"! expression "}"!)*  "'"!;
referenceType = Word ('.' Word)*;

where the 'expression' production is defined already.

I've inserted some logic here that builds up a string from the tokens I need. If we come across an expression, I evaluate it and add it to the string that's being built.

The atom and reference type productions are parsing perfectly.

But if i try parse something like 'hello', when the atom rule is run, the token produced is always of the built-in Word type.

I've tried replacing the single quote with dollar signs and other character combinations to represent the start and end of strings but it never matches.

Any ideas?



  • Creator of PEGKit here.

    Are you sure that the erroneous 'hello' tokens produced are of type Word? I suspect they may actually of type QuotedString… The default behavior of PKTokenizer would be to produce a QuotedString token for any single- or double-quoted string.

    To achieve the result you're looking for, you must alter the tokenizerState of PKTokenizer for the apostrophe (single-quote). By default, this is PKQuoteState, but you will need to change that to PKSymbolState (the tokenizers -symbolState property) so that apostrophes are recognized as single-character tokens of type Symbol instead of the beginning of a multi-character token of type QuotedString.

    You can do this in an Action at the top of your grammar (or wherever you are configuring your tokenizer):

    @before {
        PKTokenizer t = self.tokenizer;
        [t setTokenizerState:t.symbolState from:'\'' to:'\''];

    Now apostrophes will be tokenized as single-character Symbol tokens.