I've recently been working on a custom programming language using PEG.js. I made a system that recognises variable names and evaluates variable values, supporting access to object/array properties.
Global variables (glob
):
{
"null":null,
"undefined":undefined,
test:{
foobar:'it worked'
}
}
When I type test
, it evaluates to {foobar:"it worked"}
as expected.
But when I type test["foobar"]
it should return "it worked"
but instead I get this error:
Error: Variable 'test[' does not exist.
My PEG.js grammar:
Getvar
= name:Varname path:('[' _ exp:(String/Integer) _ ']' {return exp;})* {
let rt=glob[name];
if(rt===undefined&&name!='undefined'&&name!='null')
error(`Variable '${name}' does not exist.`);
for(let p of path)rt=rt[p];
return rt;
}
Varname "variable name"
= [A-z0-9]+{
if(!/[A-z]+/.test(text()))
error(`Variable name must contain at least one letter. (reading '${text()}')`);
return text();
}
String "string"
= '"' chars:DoubleStringCharacter* '"' { return chars.join(''); }
/ "'" chars:SingleStringCharacter* "'" { return chars.join(''); }
DoubleStringCharacter
= !('"' / "\\") char:. { return char; }
/ "\\" sequence:EscapeSequence { return sequence; }
SingleStringCharacter
= !("'" / "\\") char:. { return char; }
/ "\\" sequence:EscapeSequence { return sequence; }
EscapeSequence
= "'"
/ '"'
/ "\\"
/ "b" { return "\b"; }
/ "f" { return "\f"; }
/ "n" { return "\n"; }
/ "r" { return "\r"; }
/ "t" { return "\t"; }
/ "v" { return "\x0B"; }
Integer "integer"
= _ [0-9]+ { return parseInt(text(), 10); }
_ "whitespace"
= [ \t\n\r]*
Since the variable name has the pattern [A-z0-9]+
, I have no idea why [
passes as a variable name. As I was playing around trying to figure out what's going on,
I discovered that the pattern somehow matches A-z
(letters), 0-9
(numbers), but also [
and ]
.
Does anyone know why this is happening?
I don't fully understand why the generated parser works the way it does with quantified rules like for the variable name. However, if I replace your Varname
with this:
Varname "variable name"
= vfirst: Vstart vrest: Vtail* {
let rv = vfirst + vrest.join("");
return rv;
}
Vstart = sc: [A-Z]i { return sc; }
Vtail = tc: [A-Z0-9]i { return tc; }
then it works as expected. Oh, plus at the end of the start rule I added _
.
Again, I don't know why this works, but there's mention in the docs that the quantifiers do not backtrack. My instinctive (but not informed) reaction is to consider that kind-of broken. In the change I made above, the quantifier is on the rule instead of the pattern. Each pattern can only match one character, or nothing.
edit — leaving the above for historical interest, but the real problem was the pattern. In a pattern, A-z
picks up all the characters between "Z" and "a". [A-Z0-9]i
will work better. (Note that the OP figured this out, not me.)