Search code examples
rakurakudo

Parsing binary files in Raku


I would like to parse binary files in Raku using its regex / grammar engine, but I didn't found how to do it because the input is coerce to string.

Is there a way to avoid this string coercion and use objects of type Buf or Blob ?

I was thinking maybe it is possible to change something in the Metamodel ?

I know that I can use unpack but I would really like to use the grammar engine insted to have more flexibility and readability.

Am I hitting an inherent limit to Raku capabilities here ?

And before someone tells me that regexes are for string and that I shouldn't do it, it should point out that perl's regex engine can match bytes as far as I know, and I could probably use it with Regexp::Grammars, but I prefer not to and use Raku instead.

Also, I don't see any fundamental reason why regex should be reserved only to string, a NFA of automata theory isn't intriscally made for characters instead of bytes.


Solution

  • Is there a way to avoid this string coercion and use objects of type Buf or Blob ?

    Unfortunately not at present. However, one can use the Latin-1 encoding, which gives a meaning to every byte, so any byte sequence will decode to it, and could then be matched using a grammar.

    Also, I don't see any fundamental reason why regex should be reserved only to string, a NFA of automata theory isn't intriscally made for characters instead of bytes.

    There isn't one; it's widely expected that the regex/grammar engine will be rebuilt at some point in the future (primarily to deal with performance limitations), and that would be a good point to also consider handling bytes and also codepoint level strings (Uni).