I'm curious about the best practices for using a different regex engine in place of the default Perl one and why the modules I've seen are pragmas and not a more traditional OO/procedural interface. I was wondering why that is.
I've seen a handful modules for replacing the Perl regex engine with PCRE (re::engine::PCRE), TRE (re::engine::TRE), or RE2 (re::engine::RE2) in a given lexical context. I can't find any object oriented modules for creating/compiling regular expressions that use a different back end. I'm curious why someone would choose to implement this functionality as a pragma rather than as a more typical module. It seems like replacing the perl regex engine would be a lot harder (depending on the complexity of the API it exposes) than making an XS script that exposes the API that PCRE, TRE, and RE2 already provide.
I'm curious about...why the modules I've seen are pragmas and not a more traditional OO/procedural interface.
Probably because the Perl regex API, documented in perldoc perlreapi
and available since 5.9.5, lets you take advantage of Perl's parser, which gives you a lot of cool features with little code.
If you use the API, you:
split
and the substitution operator s///
msixpn
are passed as flags to your implementation's callback functions)qr
in your programs to quote regular expressions and easily interpolate them into other regexes$1
, $+{foo}
There are probably more that I've missed. The point is, you get a lot of free code and free functionality with the API. If you look at the implementation of re::engine::PCRE
, for example, it's actually fairly short (< 400 lines of XS code).
If you're just looking for an easier way to implement your own regex engine, check out re::engine::Plugin
, which lets you write your implementation in Perl instead of C/XS. Do note that there is a long list of caveats, including no support for split
and s///
.
Alternatively, instead of implementing a completely custom engine, you can extend the built-in engine by using overloaded constants as described in perldoc perlre
. This only works in constant regexes; you have to explicitly convert variables before interpolating them into a regex.