Search code examples
javascriptpegjs

Selector in PEG.js grammar accepting what is shouldn't


I've recently been working on a custom programming language using PEG.js. I made a system that recognises variable names and evaluates variable values, supporting access to object/array properties.

Global variables (glob):

{
  "null":null,
  "undefined":undefined,
  test:{
    foobar:'it worked'
  }
}

When I type test, it evaluates to {foobar:"it worked"} as expected. But when I type test["foobar"] it should return "it worked" but instead I get this error:

Error: Variable 'test[' does not exist.

My PEG.js grammar:

Getvar
= name:Varname path:('[' _ exp:(String/Integer) _ ']' {return exp;})* {
  let rt=glob[name];
    if(rt===undefined&&name!='undefined'&&name!='null')
    error(`Variable '${name}' does not exist.`);

  for(let p of path)rt=rt[p];
  return rt;
}

Varname "variable name"
= [A-z0-9]+{
  if(!/[A-z]+/.test(text()))
    error(`Variable name must contain at least one letter. (reading '${text()}')`);
  return text();
}

String "string"
  = '"' chars:DoubleStringCharacter* '"' { return chars.join(''); }
  / "'" chars:SingleStringCharacter* "'" { return chars.join(''); }

DoubleStringCharacter
  = !('"' / "\\") char:. { return char; }
  / "\\" sequence:EscapeSequence { return sequence; }

SingleStringCharacter
  = !("'" / "\\") char:. { return char; }
  / "\\" sequence:EscapeSequence { return sequence; }

EscapeSequence
  = "'"
  / '"'
  / "\\"
  / "b"  { return "\b";   }
  / "f"  { return "\f";   }
  / "n"  { return "\n";   }
  / "r"  { return "\r";   }
  / "t"  { return "\t";   }
  / "v"  { return "\x0B"; }

Integer "integer"
  = _ [0-9]+ { return parseInt(text(), 10); }

_ "whitespace"
  = [ \t\n\r]*

Since the variable name has the pattern [A-z0-9]+, I have no idea why [ passes as a variable name. As I was playing around trying to figure out what's going on, I discovered that the pattern somehow matches A-z (letters), 0-9 (numbers), but also [ and ].

Does anyone know why this is happening?


Solution

  • I don't fully understand why the generated parser works the way it does with quantified rules like for the variable name. However, if I replace your Varname with this:

    Varname "variable name"
    = vfirst: Vstart vrest: Vtail* {
      let rv = vfirst + vrest.join("");
      return rv;
    }
    
    Vstart = sc: [A-Z]i { return sc; }
    
    Vtail = tc: [A-Z0-9]i { return tc; }
    

    then it works as expected. Oh, plus at the end of the start rule I added _.

    Again, I don't know why this works, but there's mention in the docs that the quantifiers do not backtrack. My instinctive (but not informed) reaction is to consider that kind-of broken. In the change I made above, the quantifier is on the rule instead of the pattern. Each pattern can only match one character, or nothing.


    edit — leaving the above for historical interest, but the real problem was the pattern. In a pattern, A-z picks up all the characters between "Z" and "a". [A-Z0-9]i will work better. (Note that the OP figured this out, not me.)