Search code examples
pegkit

PegKit string interpolation


I'm using PegKit to build a simple domain specific interpreted language.

I essentially have everything working other than interpolated strings.

The idea is to achieve some kind of rule like this:

atom = Number | stringLiteral | referenceType;
stringLiteral = "'"! (~"'" | "{"! expression "}"!)*  "'"!;
referenceType = Word ('.' Word)*;

where the 'expression' production is defined already.

I've inserted some logic here that builds up a string from the tokens I need. If we come across an expression, I evaluate it and add it to the string that's being built.

The atom and reference type productions are parsing perfectly.

But if i try parse something like 'hello', when the atom rule is run, the token produced is always of the built-in Word type.

I've tried replacing the single quote with dollar signs and other character combinations to represent the start and end of strings but it never matches.

Any ideas?

Cheers


Solution

  • Creator of PEGKit here.

    Are you sure that the erroneous 'hello' tokens produced are of type Word? I suspect they may actually of type QuotedString… The default behavior of PKTokenizer would be to produce a QuotedString token for any single- or double-quoted string.

    To achieve the result you're looking for, you must alter the tokenizerState of PKTokenizer for the apostrophe (single-quote). By default, this is PKQuoteState, but you will need to change that to PKSymbolState (the tokenizers -symbolState property) so that apostrophes are recognized as single-character tokens of type Symbol instead of the beginning of a multi-character token of type QuotedString.

    You can do this in an Action at the top of your grammar (or wherever you are configuring your tokenizer):

    @before {
        PKTokenizer t = self.tokenizer;
        [t setTokenizerState:t.symbolState from:'\'' to:'\''];
    }
    

    Now apostrophes will be tokenized as single-character Symbol tokens.