Search code examples
perlcoding-stylelanguage-featuresbuilt-in

Perl functions that modify $_


I'm trying to expand my usage of implicit $_ (the global "topic" variable) in my code. Perlmonks has this (outdated?) article on functions which accept $_ in absence of explicit variables.

The problem I'm having is that I don't know which functions set $_. I know that at least map, grep, and for/foreach will alter the value of $_, but I assume there must be more. I am also unclear on any scope issues relating to $_, as in:

for (@array_of_array_refs)
{
  for (@$_)
  {
    print;
  }
  print;  # what does this print?
}

Is there a list of functions, or a set of guidelines to follow, so I will know intuitively how to avoid clobbering $_?


Solution

  • Steffen Ullrich's answer is misleading so I'll have to respond here. I might have missed a few things, but it's late. And, Learning Perl already explains it all. ;)

    The local operator does not work in lexical scope. It's not limited to the block it's in, despite what he says. People typically have this problem in understanding because they don't try it. Terms like "outside" and "inside" are misleading and dangerous for local.

    Consider this use, where there's a function that prints the global value of $_:

    $_ = 'Outside';
    show_it();
    inside();
    $_ = 'Outside';
    show_it();
    
    sub show_it { print "\$_ is $_\n"; }
    
    sub inside {
        local $_;
    
        $_ = 'Inside';
        show_it();
        }
    

    When you run this, you see that the value of $_ set inside a block is available outside the block:

    $_ is Outside
    $_ is Inside
    $_ is Outside
    

    The local works on package variables. It temporarily uses a new value until the end of the block. As a package variable, though, it's changed everywhere in the program until local's scope ends. The operator has a lexical scope, but its effect is everywhere. You are giving a global variable a temporary value, and that global variable is still global. local variable have global effect but lexical lifetime. They change the value for everywhere in the program until that scope ends.

    As I wrote before, it's wrong to talk about "inside" and "outside" with local. It's "before" and "after". I'll show a bit more of that coming up, where even time disintegrates.

    The my is completely different. It does not work with package variables at all. Also called "lexical variables", these don't exist at all outside their scope (even though back magic modules such as PadWalker look at them). There's no way for any other part of the program to see them. They are only visible to in their scope and sub-scopes created in that scope.

    Perl v5.10 allows us to create a lexical version of $_ (and fixed and made experimental in v5.16—don't use it. See also The good, the bad, and the ugly of lexical $_ in Perl 5.10+). I can make my previous example use that:

    use v5.10;
    
    $_ = 'Outside';
    show_it();
    inside();
    $_ = 'Outside';
    show_it();
    
    sub show_it { print "\$_ is $_\n"; }
    
    
    sub inside {
        my $_;
        $_ = 'Inside';
        show_it();
        }
    

    Now the output is different. The lexical $_ has the same effect as any other lexical variable. It does not effect anything outside its scope, again, because these variables only exist in their lexical scope:

    $_ is Outside
    $_ is Outside
    $_ is Outside
    

    But, to answer the original question. The Perlmonks post Builtin functions defaulting to $_ is still good, but I don't think it's relevant here. Those function use $_, not set it.

    The big thing to know about Perl is that there is no short answer. Perl does the thing that makes sense, not the thing that makes it consistent. It is, after all, a post-modern language.

    The way to not worry about changing $_ is not change $_. Avoid using it. We have lots of similar advice in Effective Perl Programming.

    foreach

    The looping constructs foreach and its for synonym use a localized version of $_ to refer to the current topic. Inside the loop, including anything that loop calls, uses the current topic:

    use v5.10;
    
    $_ = 'Outside';
    show_it();
    sub show_it { say "\$_ is $_"; }
    
    my @array = 'a' .. 'c';
    foreach ( @array ) {
        show_it();
        $_++
        }
    
    say "array: @array";
    

    Notice array after the foreach loop. Even though foreach localizes the $_, Perl aliases the value rather than copying it. Changing the control variable changes the original value even if that value is in an outer lexical scope:

    $_ is Outside
    $_ is a
    $_ is b
    $_ is c
    array: b c d
    

    Don't use $_ as the control variable. I only use the default in really short programs, mostly because I want the control variable to have a meaningful name in big programs.

    map and grep

    Like foreach, map and grep use $_ for the control variable. You can't use a different variable for these. You can still affect variables outside the scope through that performance-enhancing aliasing I showed in the previous section.

    Again, this means that's there some scope leak. If you change the $_ inside the block and $_ was one of the items in the input list, the outer $_ changes:

    use v5.10;
    $_ = 'Outside';
    my @transformed = map { $_ = 'From map' } ( $_ );
    say $_;
    

    For moderately complicated inline blocks, I assign $_ to a lexical variable:

    my @output = map { my $s = $_; ... } @input;
    

    And if you are really nervous about $_, don't do the evil trick of a map inside a map:

    my @words = map {
        map { split } $_
        } <>;
    

    That's a dumb example, but I've done such things in the past where I needed to turn the topic into a list.

    while( <> )

    Perl has a handy little idiom that assigns the next line from a filehandle to $_. This means that instead of this:

    while( defined( $_ = <> ) )
    

    You can get the exact same thing with:

    while( <> )
    

    But, whatever value ends up in $_ stays in $_.

    $_ = "Outside\n";
    show_it();
    sub show_it { print "\$_ is $_" }
    
    while( <DATA> ) {
        show_it();
        }
    
    show_it();
    
    __DATA__
    first line
    second line
    third line
    

    The output looks a little weird because the last line has no value, but that's the last value assigned to $_: the undef that the line input operator assigned before the defined test stopped the loop:

    $_ is Outside
    $_ is first line
    $_ is second line
    $_ is third line
    $_ is
    

    Put a last in there and the output will change

    $_ = "Outside\n";
    show_it();
    sub show_it { print "\$_ is $_" }
    
    while( <DATA> ) {
        show_it();
        last;
        }
    
    show_it();
    
    __DATA__
    first line
    second line
    third line
    

    Now the last value assigned was the first line:

    $_ is Outside
    $_ is first line
    $_ is first line
    

    If you don't like this, don't use the idiom:

    while( defined( my $line = <> ) )
    

    Pattern matching

    The substitution operator, s///, binds to $_ by default and can change it (that's sorta the point). But, with v5.14, you can use the /r flag, which leaves the original alone and returns the modified version.

    The match operator, m//, can also change $_. It doesn't change the value, but it can set the position flag. That's how Perl can do global matches in scalar context:

    use v5.10;
    
    $_ = 'Outside';
    show_it();
    sub show_it { say "\$_ is $_ with pos ", pos(); }
    
    foreach my $time ( 1 .. 5 ) {
        my $scalar = m/./g;
        show_it();
        }
    
    show_it();
    

    Some of the scalar settings in $_ change even though the value is the same:

    $_ is Outside with pos
    $_ is Outside with pos 1
    $_ is Outside with pos 2
    $_ is Outside with pos 3
    $_ is Outside with pos 4
    $_ is Outside with pos 5
    $_ is Outside with pos 5
    

    You probably aren't going to have a problem with this. You can reset the position with an unsuccessful match against $_. That is, unless you're using the /c flag. Even though the scalar value didn't change, part of its bookkeeping changed. This was one of the problems with lexical $_.

    There's another curious thing that happens with matching. The per-match variables are dynamically scoped. They don't change the values they had in the outer scope:

    use v5.10;
    
    my $string = 'The quick brown fox';
    
    OUTER: {
        $string =~ /\A(\w+)/;
        say  "\$1 is $1";
    
        INNER: {
            $string =~ /(\w{5})/;
            say  "\$1 is $1";
            }
    
        say  "\$1 is $1";
        }
    

    The value of $1 in the OUTER scope isn't replaced by the $1 in INNER:

    $1 is The
    $1 is quick
    $1 is The
    

    If that hurts your head, don't use the per-match variables. Assign them right away (and only when you've had a successful match):

    my $string = 'The quick brown fox';
    
    OUTER: {
        my( @captures ) = $string =~ /\A(\w)/;
    
        INNER: {
            my $second_word;
            if( $string =~ /(\w{5})/ ) {
                $second_word = $1
                }
            }
        }