Search code examples
perlclosuresenvironmentsubroutine

Perl - What scopes/closures/environments are producing this behaviour?


Given a root directory I wish to identify the most shallow parent directory of any .svn directory and pom.xml .

To achieve this I defined the following function

use File::Find;
sub firstDirWithFileUnder {
    $needle=@_[0];
    my $result = 0;
    sub wanted {
        print "\twanted->result is '$result'\n";
        my $dir = "${File::Find::dir}";

        if ($_ eq $needle and ((not $result) or length($dir) < length($result))) {
            $result=$dir;
            print "Setting result: '$result'\n";
        }
    }
    find(\&wanted, @_[1]);
    print "Result: '$result'\n";
    return $result;
}

..and call it thus:

    $svnDir = firstDirWithFileUnder(".svn",$projPath);
    print "\tIdentified svn dir:\n\t'$svnDir'\n";
    $pomDir = firstDirWithFileUnder("pom.xml",$projPath);
    print "\tIdentified pom.xml dir:\n\t'$pomDir'\n";

There are two situations which arise that I cannot explain:

  1. When the search for a .svn is successful, the value of $result perceived inside the nested subroutine wanted persists into the next call of firstDirWithFileUnder. So when the pom search begins, although the line my $result = 0; still exists, the wanted subroutine sees its value as the return value from the last firstDirWithFileUnder call.
  2. If the my $result = 0; line is commented out, then the function still executes properly. This means a) outer scope (firstDirWithFileUnder) can still see the $result variable to be able to return it, and b) print shows that wanted still sees $result value from last time, i.e. it seems to have formed a closure that's persisted beyond the first call of firstDirWithFileUnder.

Can somebody explain what's happening, and suggest how I can properly reset the value of $result to zero upon entering the outer scope?


Solution

  • Using warnings and then diagnostics yields this helpful information, including a solution:

    Variable "$needle" will not stay shared at ----- line 12 (#1)

    (W closure) An inner (nested) named subroutine is referencing a lexical variable defined in an outer named subroutine.

    When the inner subroutine is called, it will see the value of the outer subroutine's variable as it was before and during the first call to the outer subroutine; in this case, after the first call to the outer subroutine is complete, the inner and outer subroutines will no longer share a common value for the variable. In other words, the variable will no longer be shared.

    This problem can usually be solved by making the inner subroutine anonymous, using the sub {} syntax. When inner anonymous subs that reference variables in outer subroutines are created, they are automatically rebound to the current values of such variables.


    $result is lexically scoped, meaning a brand new variable is allocated every time you call &firstDirWithFileUnder. sub wanted { ... } is a compile-time subroutine declaration, meaning it is compiled by the Perl interpreter one time and stored in your package's symbol table. Since it contains a reference to the lexically scoped $result variable, the subroutine definition that Perl saves will only refer to the first instance of $result. The second time you call &firstDirWithFileUnder and declare a new $result variable, this will be a completely different variable than the $result inside &wanted.

    You'll want to change your sub wanted { ... } declaration to a lexically scoped, anonymous sub:

    my $wanted = sub {
        print "\twanted->result is '$result'\n";
        ...
    };
    

    and invoke File::Find::find as

    find($wanted, $_[1])
    

    Here, $wanted is a run-time declaration for a subroutine, and it gets redefined with the current reference to $result in every separate call to &firstDirWithFileUnder.


    Update: This code snippet may prove instructive:

    sub foo {
        my $foo = 0;  # lexical variable
        $bar = 0;     # global variable
        sub compiletime {
            print "compile foo is ", ++$foo, " ", \$foo, "\n";
            print "compile bar is ", ++$bar, " ", \$bar, "\n";
        }
        my $runtime = sub {
            print "runtime foo is ", ++$foo, " ", \$foo, "\n";
            print "runtime bar is ", ++$bar, " ", \$bar, "\n";
        };
        &compiletime;
        &$runtime;
        print "----------------\n";
        push @baz, \$foo;  # explained below
    }
    &foo for 1..3;
    

    Typical output:

    compile foo is 1 SCALAR(0xac18c0)
    compile bar is 1 SCALAR(0xac1938)
    runtime foo is 2 SCALAR(0xac18c0)
    runtime bar is 2 SCALAR(0xac1938)
    ----------------
    compile foo is 3 SCALAR(0xac18c0)
    compile bar is 1 SCALAR(0xac1938)
    runtime foo is 1 SCALAR(0xa63d18)
    runtime bar is 2 SCALAR(0xac1938)
    ----------------
    compile foo is 4 SCALAR(0xac18c0)
    compile bar is 1 SCALAR(0xac1938)
    runtime foo is 1 SCALAR(0xac1db8)
    runtime bar is 2 SCALAR(0xac1938)
    ----------------
    

    Note that the compile time $foo always refers to the same variable SCALAR(0xac18c0), and that this is also the run time $foo THE FIRST TIME the function is run.

    The last line of &foo, push @baz,\$foo is included in this example so that $foo doesn't get garbage collected at the end of &foo. Otherwise, the 2nd and 3rd runtime $foo might point to the same address, even though they refer to different variables (the memory is reallocated each time the variable is declared).