Search code examples
parsinggrammarkeywordlexerjison

Jison Lexer - Detect Certain Keyword as an Identifier at Certain Times


"end"  { return 'END'; }
...
0[xX][0-9a-fA-F]+ { return 'NUMBER'; }
[A-Za-z_$][A-Za-z0-9_$]* { return 'IDENT'; }
...
Call
  : IDENT ArgumentList
    {{ $$ = ['CallExpr', $1, $2]; }}
  | IDENT
    {{ $$ = ['CallExprNoArgs', $1]; }}
  ;

CallArray
  : CallElement
    {{ $$  = ['CallArray', $1]; }}
  ;

CallElement
  : CallElement "." Call
     {{ $$ = ['CallElement', $1, $3]; }}
  | Call
  ;

Hello! So, in my grammar I want "res.end();" to not detect end as a keyword, but as an ident. I've been thinking for a while about this one but couldn't solve it. Does anyone have any ideas? Thank you!

edit: It's a C-like programming language.


Solution

  • There's not quite enough information in the question to justify the assumptions I'm making here, so this answer may be inexact.

    Let's suppose we have a somewhat Lua-like language in which a.b is syntactic sugar for a["b"]. Furthermore, since the . must be followed by a lexical identifier -- in other words, it is never followed by a syntactic keyword -- we'd like to inhibit keyword recognition in this context.

    That's a pretty simple rule. It's simple enough that the lexer could implement it without any semantic information at all; all that it says is that the token which follows a . must be an identifier. In this context, keywords should be treated as identifiers, and anything else other than an identifier is an error.

    We can do this with start conditions. Specifically, we define a start condition which is only used after a . token:

    %x selector
    
    %%
    /* White space and comment rules need to explicitly include
     * the selector condition
     */
    <INITIAL,selector>\s+   ;
    
    /* Other rules, including keywords, are unmodified */
    "end"                   return "END";
    
    /* The dot rule triggers a new start condition */
    "."                     this.begin("selector"); return ".";
    
    /* Outside of the start condition, identifiers don't change state. */
    [A-Za-z_]\w*            yylval = yytext; return "ID";
    /* Only identifiers are valid in this start condition, and if found
     * the start condition is changed back. Anything else is an error.
     */
    <selector>[A-Za-z_]\w*  yylval = yytext; this.popState(); return "ID";
    <selector>.             parse_error("Expecting identifier");