Search code examples
regexperlcapturing-group

Perl: Capturing variable cannot be set undef?


I have several regexp with capturing and obviously the capturing variable retains the value of the last valid capturing:

# Two scalars to use for regexp
$x = 'abc'; 
$y = 'def'; 

# first regexp
$x =~ /^(ab)/; 
$x = $1; 

# second regexp
$y =~ /^(de)/; 
$y = $1; 
print \"$x\n$y\";

The output is:

ab
de

Here the one-liner version:

perl -e "$x='abc'; $y='def'; $x =~ /^(ab)/; $x=$1; $y =~ /^(de)/; $y=$1; print \"$x\n$y\";"

If $y='def' is changed to $y='zdef':

perl -e "$x='abc'; $y='zdef'; $x =~ /^(ab)/; $x=$1; $y =~ /^(de)/; $y=$1; print \"$x\n$y\";"

the output is:

ab
ab

If I want to set $1=undef after $x=$1 to remove the current value in $1

perl -e "$x='abc'; $y='zdef'; $x =~ /^(ab)/; $x=$1; $1=undef; $y =~ /^(de)/; $y=$1; print \"$x\n$y\";"

the output is:

Modification of a read-only value attempted at -e line 1.

Obviously, capturing variables can't be changed.

I'm wondering how I can cope with this problem. The result I would like to have is:

ab
..

where .. means "empty". Like in this case where the first regexp is undef ($x='zabc):

perl -e "$x='zabc'; $y='def'; $x =~ /^(ab)/; $x=$1; $y =~ /^(de)/; $y=$1; print \"$x\n$y\";"

..
de

Solution

  • You need to use the capture variables $1 (and $2, $3, etc) carefully. They are assigned at successful pattern matches (and unassigned), so you have to make sure you have the right match. man perlvar states (the emphasis is on successful):

           $<digits> ($1, $2, ...)
                   Contains the subpattern from the corresponding set of capturing
                   parentheses from the last successful pattern match, ...
    

    Typically, you would do this:

    if ('abc' =~ /^(ab)/) {
        $x = $1;
    }
    if ('zdef' =~ /^(de)/) {
        $y = $1;
    }
    

    This way, you never get the wrong value assigned.

    There are, however, other ways to do this. The pattern match itself gives a return value, which depends on the context.

    $n   = 'abc' =~ /^(ab)/;        # $n = 1 for "true". This is scalar context
    ($n) = 'abc' =~ /^(ab)/;        # $n = 'ab', the captured string. This is list context
    $n = () = 'abc' =~ /(.)/g;      # $n = 3, for 3 matches. /g gives multiple matches
    ($f, $g) = 'abc' =~ /(.)/g;     # $f = 'a', $g = 'b'. List context