Search code examples
ragel

How to parse non ascii / arbitrary chars in Ragel


I want to parse the C string char str[] = {0x1b, 'h', 'i'} using ragel.

I produce this sequence on the bash command line using $'\x1bhi'

However I am unable to get Ragel to execude the CmdAction.

fi:

#include <string.h>
#include <stdio.h>
%%{
machine foo;
}%%

%%{
    Space = ' ';
}%%

%%{
    Cmd = ('h'|'i') +;
    action CmdAction
    {
        fprintf(stderr, "cmd:%.*s\n", (int)(te - ts), ts);
    }
}%%

%%{
    main :=
    |*
        [\x1b] Cmd => CmdAction;
        "\x1b" Cmd => CmdAction;
        'E' Cmd => CmdAction;
        space+;
    *|;
}%%

%% write data;


int main( int argc, char **argv ) {
    for (int i = 0; i < strlen(argv[1]); i++) {
        fprintf(stderr, "%d] 0x%02x\n", i, argv[1][i]);
    }
    int cs, res = 0;
    int top;
    char *ts;
    char *te;
    int act;
    char *eof = NULL;
    int stack[128];
    if ( argc > 1 ) {
        char *p = argv[1];
        char *pe = p + strlen(p) + 1;
        %% write init;
        %% write exec;
    }
    printf("result = %i\n", res );
    return 0;
}

// from bash use $'' to produce raw data strings as arg
//main $'\x1bhi'
ragel main.c -o main.c.c
gcc main.c.c -o main
./main $'\x1bhi'
0] 0x1b
1] 0x68
2] 0x69
result = 0
./main $'Ehi'
0] 0x45
1] 0x68
2] 0x69
cmd:Ehi
result = 0

How to parse arbitrary chars in Ragel?

What input would the above Ragel code accept?


Solution

  • Quick fix:

    --- main.c.orig 2025-02-06 13:29:28.501665490 +0300
    +++ main.c      2025-02-06 13:29:58.933601263 +0300
    @@ -19,8 +19,7 @@
     %%{
         main :=
         |*
    -        [\x1b] Cmd => CmdAction;
    -        "\x1b" Cmd => CmdAction;
    +        0x1b Cmd => CmdAction;
             'E' Cmd => CmdAction;
             space+;
         *|;
    

    The problem is your 0x1b specification syntax, Ragel doesn't support (and doesn't need to) \x (which is interpreted as x), so:

    • [\x1b] is any of x, 1 or b (try ./main bhi, ./main xhi, ./main 1hi with the original code)
    • "\x1b" is an x1b string, can be checked with ./main x1bhi

    But a simple 0x1b works fine as expected.