Search code examples
regexperl

perl using constant in regex


I'm wondering about using constants in perl regex's. I want to do something similar to:

use constant FOO => "foo"
use constant BAR => "bar"

$somvar =~ s/prefix1_FOO/prefix2_BAR/g;

of course, in there, FOO resolves to the three letters F O O instead of expanding to the constant.

I looked online, and someone was suggesting using either ${\FOO}, or @{[FOO]} Someone else mentioned (?{FOO}). I was wondering if anyone could shed some light on which of these is correct, and if there's any advantage to any of them. Alternatively, is it better to just use a non-constant variable? (performance is a factor in my case).


Solution

  • The shown problem is due to those constants being barewords (built at compile time)

    Constants defined using this module cannot be interpolated into strings like variables.

    In the current implemenation (of constant pragma) they are "inlinable subroutines" (see ).

    This problem can be solved nicely by using a module like Const::Fast

    use Const::Fast;
    
    const my $foo => 'FOO';
    const my $bar => 'BAR';
    
    my $var = 'prefix1_FOO_more';
    
    $var =~ s/prefix1_$foo/prefix2_$bar/g;
    

    Now they do get interpolated. Note that more complex replacements may need /e.

    These are built at runtime so you can assign results of expressions to them. In particular, you can use the qr operator, for example

    const my $patt => qr/$foo/i;  # case-insensitive 
    

    The qr is the recommended way to build regex patterns. (It interpolates unless the delimiter is '.) The performance gain is most often tiny, but you get a proper regular expression, which can be built and used as such (and in this case a constant as well).

    I recommend Const::Fast module over the other one readily, and in fact over all others at this time. See a recent article with a detailed discussion of both. Here is a review of many other options.

    I strongly recommend to use a constant (of your chosen sort) for things meant to be read-only. That is good for the health of the code, and of developers who come into contact with it (yourself in the proverbial six months included).


    These being subroutines, we need to be able to run code in order to have them evaluated and replaced by given values. Can't just "interpolate" (evaluate) a variable -- it's not a variable.

    A way to run code inside a string (which need be interpolated, so effectively double quoted) is to de-reference, where there's an expression in a block under a reference; then the expression is evaluated. So we need to first make that reference. So either

    say "@{ [FOO] }";  # make array reference, then dereference
    

    or

    say "${ \FOO }";   # make scalar reference then dereference
    

    prints foo. See the docs for why this works and for examples. Thus one can do the same inside a regex, and both in matching and replacement parts

    s/prefix1_${\FOO}/prefix2_${\BAR}/g;
    

    (or with @{[...]}), since they are evaluated as interpolated strings.

    Which is "better"? These are tricks. There is rarely, if ever, a need for doing this. It has a very good chance to confuse the reader. So I just wouldn't recommend resorting to these at all.

    As for (?{ code }), that is a regex feature, whereby code is evaluated inside a pattern (matching side only). It is complex and tricky and very rarely needed. See about it in perlretut and in perlre.

    Discussing speed of these things isn't really relevant. They are certainly outside the realm of clean and idiomatic code, while you'd be hard pressed to detect runtime differences.

    But if you must use one of these, I'd much rather interpolate inside a scalar reference via a trick then reach for a complex regex feature.

    Not a HASH reference

    In rare cases the solution ${\FOO} needs extra tweaking, as for ${\FOO}{6,20}:

    Perl complains

    Not a HASH reference at ...

    The fix is to add extra clustering like this: (?:${\FOO}){6,20}