Search code examples
ragel

Optimize Ragel semantic conditions for any data of known length


There is an example in Ragel manual 6.5 Semantic conditions, which demonstrates how to write a grammar for variable size structures, using when clause.

action rec_num { i = 0; n = getnumber(); }
action test_len { i++ < n }
data_fields = (
’d’
[0-9]+ %rec_num
’:’
( [a-z] when test_len )*
)**;

It works fine for small structures, however for bigger structures it slows down, because parser tries to evaluate condition on every character.

What I am trying to do is to skip scanning and just copy data into the buffer, for a grammar like this (note any*):

action rec_num { i = 0; n = getnumber(); }
action test_len { i++ < n }
data_fields = (
’d’
[0-9]+ %rec_num
’:’
( any* when test_len )*
)**;

So I want to copy buffer of length n straight away without iteration. How can I do this without leaving parser context?


Solution

  • You probably need to take matters into your own hand. The ragel user guide mentions that you can alter the fpc/p variable within the machine so this should be safe enough. This assumes you're processing all your data in one chunk (ie, the data field won't be broken up)

    machine foo;
    
    action rec_num { i = 0; n = getnumber(); }
    action test_len { i++ < n }
    action buffer_data_field {
      /* p is still pointing to ':' at this point. */
      if (p + 1 + n >= pe) { fgoto *foo_error; }
      buffer(p + 1, n);
      p += n;
    }
    action buffer_data_field_eof {
      /* check for eof while data was expected */
      /* p is pointing past the ':' at this point */
      if (n) { fgoto *foo_error; }
    }
    
    data_fields = (
    'd'
    [0-9]+ %rec_num
    ':' 
    $buffer_data_field
    $eof(buffer_data_field_eof)
    
    )**;
    

    If data is chunked up, you could split the buffering out:

    buffer_data :=
        any+
        $eof{ fnext *foo_error; }
        ${
            size_t avail = pe - p;
            if (avail >= n) {
                buffer(p, n);
                p += n - 1;
                fnext data_fields;
            } else {
                buffer_partial(p, avail);
                n -= avail;
                p += avail - 1;
            }
        }
        ;
    
    
    data_fields = (
        'd'
        [0-9]+ %rec_num
        ':'  ${ if (n) fnext buffer_data; }
        )**;