Search code examples
compiler-constructionlexjflex

Significance of yy in scanner/lexer such as jflex


Whats is the significance of the prefix yy in lexers?

I see methods/fields such as yyline, yycolumn, yytext, e.t.c

Is there some historical significance to the prefix yy in the world of lexers?

PS: I am completely new to compilers


Solution

  • The yy prefix naming convention actually comes from the parser generator Yacc. Lex was originally written to do the lexical analysis for parsers built with Yacc, and so it chose the name yylex for the generated scanner because that is what Yacc expects the scanner to be called.

    To simplify code generation, both Yacc and Lex use global variables to communicate most information, including the semantic value of the lexemes and their physical position. Again, the impetus came from Yacc, which expected to find the semantic value for each token in the global variable yylval (which you could think of as meaning yy-lexeme-value).

    From there, it was convenient to just require users of scanner and parser generators to avoid all symbols with external linkage whose names start yy, leaving them for the generated code. This wasn't seen as much of a restriction since it would have been pretty uncommon for programmers to use a variable with a name starting yy. Even now, it seems unlikely -- I can't recall ever seeing a question on SO related to a variable with a name starting yy conflicting with an internal variable in Lex/Yacc generated code.

    Other than being an unusual name prefix, yy was intended to relate to the "y" in "Yacc". As it happens, Yacc was so-named as an acronym for "Yet Another Compiler Compiler", which was a self-deprecating and ironic commentary based on the number of attempts to build automated parser generators current in the early 1970s. For a variety of reasons, Yacc emerged the winner in this crowded field, and consequently its interface decisions, for better or worse, became a kind of informal standard for future products.