Search code examples
regexperlpcrere2

Why implement a different regex engine (e.g. PCRE) as a pragma?


I'm curious about the best practices for using a different regex engine in place of the default Perl one and why the modules I've seen are pragmas and not a more traditional OO/procedural interface. I was wondering why that is.

I've seen a handful modules for replacing the Perl regex engine with PCRE (re::engine::PCRE), TRE (re::engine::TRE), or RE2 (re::engine::RE2) in a given lexical context. I can't find any object oriented modules for creating/compiling regular expressions that use a different back end. I'm curious why someone would choose to implement this functionality as a pragma rather than as a more typical module. It seems like replacing the perl regex engine would be a lot harder (depending on the complexity of the API it exposes) than making an XS script that exposes the API that PCRE, TRE, and RE2 already provide.


Solution

  • I'm curious about...why the modules I've seen are pragmas and not a more traditional OO/procedural interface.

    Probably because the Perl regex API, documented in perldoc perlreapi and available since 5.9.5, lets you take advantage of Perl's parser, which gives you a lot of cool features with little code.

    If you use the API, you:

    • don't have to implement your own version of split and the substitution operator s///
    • don't have to write your own code to parse regex modifiers (msixpn are passed as flags to your implementation's callback functions)
    • can take advantage of optimizations like constant regexes being compiled only once (at compile time) and regexes containing interpolated variables being compiled only when the variables change
    • can use qr in your programs to quote regular expressions and easily interpolate them into other regexes
    • can easily set numbered and named capture variables, e.g. $1, $+{foo}
    • don't force users of your engine to rewrite all of their code to use your API; they can simply add a pragma

    There are probably more that I've missed. The point is, you get a lot of free code and free functionality with the API. If you look at the implementation of re::engine::PCRE, for example, it's actually fairly short (< 400 lines of XS code).

    Alternatives

    If you're just looking for an easier way to implement your own regex engine, check out re::engine::Plugin, which lets you write your implementation in Perl instead of C/XS. Do note that there is a long list of caveats, including no support for split and s///.

    Alternatively, instead of implementing a completely custom engine, you can extend the built-in engine by using overloaded constants as described in perldoc perlre. This only works in constant regexes; you have to explicitly convert variables before interpolating them into a regex.