Are there any reasons to start building symbol table at the lexer stage?
In the book flex & bison: Text Processing Tools the author gives an example of lexer with an attempt to build a simple symbol table. There is a workaround in the following code to distinguish a symbol definition from its references:
/* declaration keywords */
auto |
char |
int |
/* ... skip ... */
volatile { defining = 1; }
/* ... skip ... */
/* punctuators */
"{"|"<%"|";"
{ defining = 0; }
This solution will not work in more complicated cases, such as int a = b, c = d;
(symbol c
will not be marked as a definition). In addition to this, nested scopes cannot be handled at the lexer stage.
It is noticed in the question lex and yacc (symbol table generation) that accessing symbol table from lexer is conventional, but I still can not see pros and why table built in lexer may be useful later.
One reason is memory management. It is conventional to make a copy of token strings passed from lexer to parser (at least in the case of identifier tokens), but identifiers usually occur more than once in source text and only one copy is really necessary.
Rather than perform the copy each time, it can be convenient to "intern" the string in a hashtable of identifiers and just pass the hash table entry. That way, the second and subsequent appearances of each symbol do not incur any dynamic allocation. Also, the entire string storage can be held as part of the string table data structure, which can simplify the logic for releasing the dynamically allocated storage.
That is not exactly a symbol table, since it does not (yet) hold any semantic or scope information. But the string table could certainly be the basic structure which holds the symbol table, at least enough to qualify as the "start [of] building the symbol table".
In certain languages -- C being the canonical example -- the lexer might want to be able to consult the semantic information in the symbol table, so the sharing might be more intertwined. But even without that hackery, sharing the basic index mechanism may prove useful, and does not necessarily break the concept of separation of concerns.