Methods in yacc to get line no

What are the different ways of getting line no's in yacc. I know about yylineno but it can be off by 1 when yacc reads lookahead token.

I had this same problem with using yytext in yacc. It didn't give the expected token. I used yylval.str in lex to get that done. Is there anything similar to that to get exact line no.

Are there better options in Bison which can make handling easier and accurate

Solution

The role of a scanner is to produce a sequence of lexical tokens, each of which has a type, a semantic value, and possibly a location. The parser receives this stream of tokens, and processes them into a structural representation of the input.

It should be clear that in the parser, "the last semantic value scanned" is of little or no value, particularly since the parser usually needs to ask the scanner to look ahead at least one token in order to decide how to proceed. But more generally, a parser action will be combining information about a series of tokens (and a series of already reduced productions), so there is no "one value" which makes sense in the action.

Similarly, every token has a location in the input, and if the parser needs to associate locations with syntax features, it needs to be able to refer to the location of any given token.

Bison facilitates this process by allowing the scanner to fill in a location object as well as a semantic value (yylval). The location object is called yylloc and, unlike the semantic value, it is normally the same type for every token. If your Bison source uses locations, it will create a location object stack which is synchronized with the semantic value stack. In a rule, the semantic value of a token (or non-terminal) can be referred to as $1, $2, etc.; similarly, the location of the token/non-terminal will be @1, @2, ...

You don't need to tell Bison to collect location information. It will happen automatically if you simply use some location reference (@n) in any parser action.

In fact, you don't need to do much in your parser to make use of location information, since the defaults are often sufficient. Unless you #define the preprocessor macro YYLTYPE, a location object type called YYLTYPE will be declared as follows:

typedef struct YYLTYPE {
  int first_line;
  int first_column;
  int last_line;
  int last_column;
} YYLTYPE;

The declaration will be placed into the generated header file so that it can be used by the scanner as well. The YYLTYPE structure reflects the fact that a token -- and, more importantly, a non-terminal -- spans a range of locations in the source file.

The location structure for a non-terminal will be also be filled in automatically by the generated parser, although you are free to modify it by assigning to @$ inside a parser action. The default is to take the first_line and first_column fields from @1 and the last_line and last_column fields from @n, where n is the number of symbols in the right-hand side. In other words, when you reduce a production, the resulting location will span all of the source text representing the tokens of the production.

Although yylloc contains both line and column information, you are not required to use the column data. It is most convenient to just leave those fields set to 0, in case you want to use them in some later version of your parser; you could reduce the overhead of the location stack by redefining YYLTYPE, but then you would need to override the default location action as well because it refers to those named fields.

Filling in the yylloc object is entirely the responsibility of your scanner, and flex does not, unfortunately, help you much. Flex will maintain yylineno if you ask it to (%option yylineno) but it won't populate yylloc, so you need to do that yourself. Fortunately, flex does let you define the YY_USER_ACTION macro. This macro is inserted at the beginning of every flex action, and it can be used to copy location information into yylloc.

As a simple example, if none of your tokens span more than one line (or you don't care about the starting line of tokens which do span more than one line), you could simply put this in the prologue of your flex definition:

#define YY_USER_ACTION yylloc.first_line = yylloc.last_line = yylineno;

and enable yylineno tracking with

%option yylineno

Once you've done that, and with no other changes to your flex or bison definitions, you'll be able to write actions such as:

assignment: IDENTIFIER '=' expression {
               printf("%s is defined at line %d\n", $1, @1.first_line);
            }

Note that since the above rule refers to the location of the IDENTIFIER token, it does not matter how many lines the expression uses up. You could make use of the default setting of @$ to be more precise:

assignment: IDENTIFIER '=' expression {
               printf("%s is defined in lines %d to %d\n",
                      $1, @$.first_line, @$.last_line);
            }