Search code examples
cbisonflex-lexeryacclex

Lex/yacc only detecting one token


I am trying to write a LaTeX parser using lex and yacc but I am struggling. Here is my lexer:

%{
#include "y.tab.h"
#include <stdio.h>
%}

%%
^\\begin\{.*\} {return BEG;}
%%

int yywrap() {
    return 1;
}

and here is my parser:

%{
#include <stdio.h>
#include <stdlib.h>

void yyerror(char *s);
int yylex();
extern FILE *yyin;
%}

%token BEG

%%
beg: BEG {printf("Hello world\n");}
%%

void yyerror(char *s) {
    fprintf(stderr, "%s\n", s);
}

int main(int argc, char **argv) {

    if (argc != 2) {
        fprintf(stderr, "Wrong number of arguments provided\n");
        exit(1);
    }
    yyin = fopen(argv[1], "r");
    if (!yyin) {
        fprintf(stderr, "Not a valid filename\n");
        exit(1);
    }
    yyparse();
    return 0;
}

Now, if I run this on the LaTeX snippet

\begin{document}
\begin{equation}
    x = 3
\end{equation}
\end{document}

I get

Hello world

syntax error

It seems like the parser is only seeing one \begin pattern, instead of two. Why is that? I really don't see why. Thank you in advance.

EDIT: I tried something like

lines: line
     | lines line
     ;
line: beg '\n'
    | ID '\n'
    ;
beg: BEG {printf("Hello world\n");}
   ;

where ID corresponds to the regex .*, but I get the same error.


Solution

  • Lexer:

    %{
    #include "y.tab.h"
    #include <stdio.h>
    #include <string.h>
    %}
    
    %%
    ^\\begin\{.*\} {return BEG;}
    \n {
            return  *yytext;
        }
    %%
    
    int yywrap() {
        return 1;
    }
    

    Parserx:

    %{
    #include <stdio.h>
    #include <stdlib.h>
    #include <string.h>
    
    void yyerror(char *s);
    int yylex();
    extern FILE *yyin;
    %}
    
    %token BEG
    %start beg
    %%
    beg: BEG '\n' {printf("Hello world\n");}
    %%
    
    void yyerror(char *s) {
        fprintf(stderr, "%s\n", s);
    }
    
    int main(int argc, char **argv) {
    
        if (argc != 2) {
            fprintf(stderr, "Wrong number of arguments provided\n");
            exit(1);
        }
        yyin = fopen(argv[1], "r");
        if (!yyin) {
            fprintf(stderr, "Not a valid filename\n");
            exit(1);
        }
        yyparse();
        return 0;
    }
    

    This above code is as much as I can remember, And I would also suggest you to first to make note of what kind of tokens you are expecting and grammar based on what you actually want to do with those tokens.

    In following grammar:

    lines: line
         | lines line
         ;
    line: beg '\n'
        | ID '\n'
        ;
    beg: BEG {printf("Hello world\n");}
       ;
    

    lines is a start variable, with a set of non-terminals as lines, line, and beg, and terminals (tokens) ID, BEG, '\n'. Though this grammar does not make any sense since it is based on your lexer because your lexer is also supposed to return these tokens.

    The following grammar means you have a start token as beg and you are getting a token BEG and a token '\n'. Based on which you are printing 'Hello World'. Though I don't really know how this will proceed.

    beg: BEG '\n'  {printf("Hello world\n");}