What's the best practice for dealing with complex literals in Rascal?
Two examples from JavaScript (my DSL has similar cases):
\
escapes - have to be unescaped into actual value.implode
refuses to map lexicals to abstract trees, they are obviously handed differently from syntax productions, despite having complete parse trees available. For example, the following parser fails with IllegalArgument("Missing lexical constructor")
:
module lexicals
import Prelude;
lexical Char = "\\" ![] | ![\\]; // potentially escaped character
lexical String = "\"" Char* "\""; // if I make this "syntax", implode works as expected
start syntax Expr = string: String;
data EXPR = string(list[str] chars);
void main(list[str] args) {
str text = "\"Hello\\nworld\"";
print(implode(#EXPR, parse(#Expr, text)));
}
The only idea I have so far is to capture all lexicals as raw strings and later re-parse them (implode and all) using separately defined syntaxes without layout whitespace. Hopefully, there's a better way.
The way implode
converts a parse tree into an ast is document in the rascal tutor:implode. This contains the following rule:
Unlabeled lexicals are imploded to str, int, real, bool depending on the expected type in the ADT. To implode lexical into types other than str, the PDB parse functions for integers and doubles are used. Boolean lexicals should match "true" or "false". NB: lexicals are imploded this way, even if they are ambiguous.
So, solution 1 is to add a label to your production:
lexical String = string: "\"" Char* "\"";
Also, perhaps you do not need to have an AST next to your parse tree? At least not one that has to closely match your grammar. The two common scenario's are:
implode
function.We are leaning more and more to deprecating the implode
function since our concrete syntax is powerfull enough for most cases.