Search code examples
c++fsmragel

How to get Ragel perform different actions for parsing


I am new to Ragel and have been trying to parse a specific pattern of Regex expression. I want action done to be executed if a match is found and parse_error to be executed if there is no match even for any single character missing.

Here is the code I have written:

#include <iostream>
#include <string.h>
#include <stdio.h>

%%{
action done {
printf("done\n");
}


action parse_error {
printf("error : %c\n",fc);
}


machine ldf;    
main := (':'.'LoadSdf'.[0-9]+.[a-zA-Z0-9_\-\.])@done |    //execute done
         (^(':'.'LoadSdf'.[0-9]+.[a-zA-Z0-9_\-\.])) $err(parse_error); //execute parse error for no match

}%%

%%write data;
int main(int argc, char** argv)
{
int cs;
 if(argc > 1) {
char *p = argv[1];
char *pe = p+strlen(p) + 1;
%%write init;
%%write exec;
}
 return 0;
}

The behaviour I see is that actions done and parse_error are both executed when there is a perfect match of the regex expression.

Can anyone provide some tips on how I can tackle this case?


Solution

  • There are several problems with this code. First, a technical error --- you're off-by-one with pe definition (it includes zero char and your machine shouldn't care about zeroes (of course you can make it handle them, but it just complicates things for no reason)). It's also useful to have eof defined because it should be an error when you have something like ":Load" in the input (missing the "Sdf" and following chunks). That is fixed by

    -char *pe = p+strlen(p) + 1;
    +char *pe = p+strlen(p);
    +char *eof = pe;
    

    The other problem is that there is no need to combine some machine and its negative to control errors. These are different actions. Take a look at the picture of your machine:

    Original state machine

    You can see that in the middle there is just no proper error handling here and in the end you can have done() invoked several times because it's specified to happen on transition to one of the final states. Probably it's only supposed to be run on correct machine finish (that is, reaching EOF in a final state).

    So if you're to change your machine definition to

    main := (':'.'LoadSdf'.[0-9]+.[a-zA-Z0-9_\-\.]) %/done $!(parse_error);
    

    you will probably get what you want:

    $ ./a.out "asdf"
    error : a
    $ ./a.out "qwerty"
    error : q
    $ ./a.out ":Load"
    error : 
    $ ./a.out ":LoadSdf"
    error : 
    $ ./a.out ":LoadSdf1212"
    done
    $ ./a.out ":LoadSdf1q"
    done
    $ ./a.out ":LoadSdf1qwe"
    error : w
    

    Which looks like this in the graphical form:

    Fixed state machine