Search code examples
cyacclexbnf

How to parse from char array using yacc?


I am trying to parse string from buffer.

Lex code

%{
#include <stdio.h>
#include "y.tab.h"
%}

%%
"Type:[0-9]+" {
            printf("lex-TYPE\n");
            return TYPE;
        };
%%

Yacc code

%{
#include <stdio.h>
#include <string.h>
extern char *yytext;
%}

%start general

%token TYPE

%%
general: |
        general TYPE {
                            printf("gen %c\n", yytext[strlen("TYPE:")]);
                        }
        ;
%%

C code

#include <stdio.h>
#include "y.tab.h"

int main()
{
    printf("6\n");
    yy_scan_buffer("TYPE:0 ");
    printf("8\n");
//    yylex();
    yyparse();

    return 0;
}

void yyerror(char *s)
{
fprintf( stderr, "%s\n" ,s);
}

int yywrap()
{
    return(1);
}

Compile

 # lex bnf.lex 
 # yacc -d bnf.yacc
# cc main.c y.tab.c lex.yy.c -o test
# ./test 

Output:

6
8
TYPE:)   <--this is input from keyboard
TYPE:)   <--I don't know why it is copied
^C

I don't see that it recognizes my lexeme, and I don't understand why it don't upload my buffer from yy_scan_buffer ?

I want to put different strings as argument, parse it and do some magic .

Can you help me?

#UPD

C code

#include <stdio.h>
#include "y.tab.h"

int main()
{
    printf("6\n");
    yyscan_t scanner;
    YY_BUFFER_STATE buf;
    yylex_init(&scanner);
    buf = yy_scan_string("TYPE:102 ", scanner);
    yylex(scanner);
    yy_delete_buffer(buf, scanner);
    yylex_destroy(scanner);
    printf("8\n");
    yylex();
//    yyparse();

    return 0;
}

Output

# cc main.c y.tab.c lex.yy.c -o test
main.c: In function ‘main’:
main.c:7: error: ‘yyscan_t’ undeclared (first use in this function)
main.c:7: error: (Each undeclared identifier is reported only once
main.c:7: error: for each function it appears in.)
main.c:7: error: expected ‘;’ before ‘scanner’
main.c:8: error: ‘YY_BUFFER_STATE’ undeclared (first use in this function)
main.c:8: error: expected ‘;’ before ‘buf’
main.c:9: error: ‘scanner’ undeclared (first use in this function)
main.c:10: error: ‘buf’ undeclared (first use in this function)

This is the second way to pass string into yacc, but it has a lot of errors that I don't know why appeared.

Can you help me?

#UPD

# lex --version
flex 2.5.35
# yacc --version
bison (GNU Bison) 2.3

UPD

One more try

typedef struct yy_buffer_state * YY_BUFFER_STATE;
extern int yyparse();
extern YY_BUFFER_STATE yy_scan_string(char * str);
extern void yy_delete_buffer(YY_BUFFER_STATE buffer);

int main()
{
    printf("6\n");
    char string[] = "TYPE:12";
    YY_BUFFER_STATE buffer = yy_scan_string(string);
    yyparse();
    yy_delete_buffer(buffer);
    printf("8\n");
//    yylex();
//    yyparse();

    return 0;
}

Output

# ./test 
6
TYPE:128
# 

So, it didn't find any lexeme. Why?

UPD

After John Bollinger's answer I replaced my .lex and .c files by his.

Removed Type:[0-9]+

Added to lex

Type:       {
            printf("lex-TYPE\n");
            return TYPE;
            };
[0-9]+ {
        printf("lex-D\n");
        return DIGIT;
        };

Changed yacc

%%
general: |
        general TYPE DIGIT {
                            printf("gen %c\n", yytext[0]);
                        }
        ;
%%

And now I see

# ./test 
6
8
TYPE:lex-D
syntax error

So, I finally match the pattern, but why it fails?


Solution

  • TYPE:)   <--this is input from keyboard
    TYPE:)   <--I don't know why it is copied
    

    There seem to be two issues there:

    • why the input read is from the keyboard instead of the specified buffer

      • Because you have not set up the in-memory buffer correctly, and
      • because you have not called yy_scan_buffer() correctly.
    • why the input is echoed to the output:

      Because it does not match any lexer rule, and because the default rule, which is not overridden in the provided scanner definition, writes otherwise-unmatched characters to the standard output.

    In more detail:

    The docs for yy_scan_buffer() specify that it

    scans in place the buffer starting at base, consisting of size bytes, the last two bytes of which must be YY_END_OF_BUFFER_CHAR (ASCII NUL). These last two bytes are not scanned; thus, scanning consists of base[0] through base[size-2], inclusive.

    and

    If you fail to set up base in this manner (i.e., forget the final two YY_END_OF_BUFFER_CHAR bytes), then yy_scan_buffer() returns a NULL pointer instead of creating a new input buffer.

    (Emphasis added.)

    You not having ensured that the last two characters of the proffered buffer are YY_END_OF_BUFFER_CHAR, the buffer is not set up correctly. You would have been clued in to this issue if you had checked the function's return value, as you should always do for functions that can fail and that inform about that via their return values (which are many).

    Moreover, yy_scan_buffer() requires two arguments, the first a pointer to the buffer, and the second its effective size. All else notwithstanding, you elicit undefined behavior by calling it with the wrong number of parameters. Perhaps you were looking for yy_scan_string() instead.

    Additionally, however, your compiler ought to be warning you about calling a function that has not previously been declared. It's unclear to what extent any of any of the buffer-manipulation mechanisms are intended to be twiddled externally, as opposed to by your scanner rules, but at minimum you should limit yourself to using them from within the scanner definition (including the user code section).

    As for why the input does not match, there are actually two problems. First, you've quoted the pattern. The quotation marks are not special, so Flex is looking for input that contains literal quotation marks. Second, your input contains no match for the [0-9]+ part of the pattern. Either one of these on its own would be sufficient to prevent the input presented (or the intended contents of your in-memory buffer) from matching.

    Here's a version of the scanner definition that works as you seem to want, in conjunction with complementary code in the main C source file:

    %{
    #include <stdio.h>
    #include "y.tab.h"
    %}
    
    %%
    Type:[0-9]+ {
                printf("lex-TYPE\n");
                return TYPE;
            };
    %%
    
    static YY_BUFFER_STATE my_string_buffer;
    
    int my_scan_string(const char *s) {
        // insist on cleaning up any existing buffer before setting up a new one
        if (my_string_buffer != NULL) return -1;
    
        // Set up and switch to a buffer for scanning the contents of the
        // specified string.  A copy of the string will be made.
        my_string_buffer = yy_scan_string(s);
        return (my_string_buffer == NULL) ? -1 : 0;
    }
    
    void my_cleanup(void) {
        // No effect if my_string_buffer is NULL
        yy_delete_buffer(my_string_buffer);
        // ... but avoid trying to free the same buffer twice
        my_string_buffer = NULL;
    }
    

    Note that the pattern in the single scanner rule is corrected, but more importantly that the code relying on scanner internal interfaces appears in the scanner definition, in the "user code" section. This ensures that you do not need to guess or duplicate the internal interfaces.

    This does not require any changes to your parser definition, but it does require changes to your main source file:

    #include <stdio.h>
    #include "y.tab.h"
    
    int my_scan_string(const char *s);
    void my_cleanup(void);
    
    int main()
    {
        printf("6\n");
        if (my_scan_string("TYPE:0 ") != 0) {
            fputs("error setting up an internal buffer\n", stderr);
            exit(1);
        }
        printf("8\n");
        yyparse();
        my_cleanup();
    
        return 0;
    }
    
    void yyerror(char *s) {
        fprintf(stderr, "%s\n" ,s);
    }
    
    int yywrap(void) {
        return 1;
    }