I am working on lexing a custom javascript based language (transpiles to JS), and have become stumped at the point of how to structure the (identical to js) template literals.
Currently I lex strings as a complete token (including escapes), but I cannot do that for template strings due to the embedded expressions.
The language allows multiple statements on a single line (i.e. whitespace is ignored) such as the following: let foo = {} let bar = `a template literal with embedded ${foo} expression`
I could create a token matching the backtick, but then I would need to create a token matching "string text" which would include escapes etc. (without it conflicting with other expression tokens).
I was hoping to match with the regex ([`}])[^`\\$\n]*(`|\$\{)
(escapes omitted for brevity), but that would fail for obvious reasons when declared after an object (as in the example).
For reference I am using the rust logos library to tokenise, but I think this is more an issue of not knowing how to begin implementing the tokens than the specific library.
In order to lexically analyze template strings like
let bar = `a template literal with embedded ${foo} expression`
you'll need a scanner generator that allows multiple scanner states. Because the lexical context inside the template is very different from the context in the rest of the program.
That's a very common feature -- lex has had it basically forever -- but I'm not at all familiar with the scanner generator you are using and the easily searched documentation wasn't sufficiently clear for me to tell whether it implements states, much less the syntax for using them.
You only need two states for this problem, but since the syntax is recursive you need to back that up with a state stack of some kind.
Basically, you have your normal state, in which there are rules for every token in the language. One of those tokens will be `
, which pushes the current state onto the state stack and changes to the "inside template" state. The only other modification is that {
and }
need to push and pop the stack. ({
pushes the current state onto the stack but does not change state.)
In the "inside template" state, there are only a few token types:
`
: pops the state stack (and returns a `
token)${
pushes the state stack and switches to the normal state.That's a very rough outline.