Search code examples
perlclosuressubroutine

Local variable visibility in closures vs. local `sub`s


Perl 5.18.2 accepts "local subroutines", it seems.

Example:

sub outer()
{
    my $x = 'x';   # just to make a simple example

    sub inner($)
    {
        print "${x}$_[0]\n";
    }

    inner('foo');
}

Without "local subroutines" I would have written:

#...
    my $inner = sub ($) {
        print "${x}$_[0]\n";
    }

    $inner->('foo');
#...

And most importantly I would consider both to be equivalent.

However the first variant does not work as Perl complains:

Variable $x is not available at ...

where ... describes the line there $x is referenced in the "local subroutine".

Who can explain this; are Perl's local subroutines fundamentally different from Pascal's local subroutines?


Solution

  • The term "local subroutine" in the question seems to be referring to lexical subroutines. These are private subroutines visible only within the scope (block) where they are defined, after the definition; just like private variables.

    But they are defined (or pre-declared) with my or state, as my sub subname { ... }

    Just writing a sub subname { ... } inside of another doesn't make it "local" (in any version of Perl), but it is compiled just as if it were written alongside that other subroutine and is placed in their package's symbol table (main:: for example).


    The question mentions closure in the title and here is a comment on that

    A closure in Perl is a structure in a program, normally a scalar variable, with a reference to a sub and which carries environment (variables) from its scope at its (runtime) creation. See also a perlfaq7 entry on it. Messy to explain. For example:

    sub gen { 
        my $args = "@_"; 
    
        my $cr = sub { say "Closed over: |$args|. Args for this sub: @_" }
        return $cr;
    }
    
    my $f = gen( qw(args for gen) );
    
    $f->("hi closed");
    # Prints:
    # Closed over: |args for gen|. Args for this sub: hi closed
    

    The anonymous sub "closes over" the variables in scope where it's defined, in a sense that when its generating function returns its reference and goes out of scope those variables still live on, because of the existence of that reference. Since anonymous subs are created at runtime, every time its generating function is called and lexicals in it remade so is the anon sub, so it always has access to current values. Thus the returned reference to the anon-sub uses lexical data, which would otherwise be gone. A little piece of magic.

    Back to the question of "local" subs. If we want to introduce actual closures to the question, we'd need to return a code reference from the outer subroutine, like

    sub outer {
        my $x = 'x' . "@_";
        return sub { say "$x @_" }
    }
    my $f = outer("args");
    $f->( qw(code ref) );   # prints:  xargs code ref
    

    Or, per the main question, as introduced in v5.18.0 and stable from v5.26.0, we can use a named lexical (truly nested!) subroutine

    sub outer {
        my $x = 'x' . "@_";
        
        my sub inner { say "$x @_" };
    
        return \&inner;
    }
    

    In both cases my $f = outer(...); has the code reference returned from outer which correctly uses the local lexical variables ($x), with their most current values.

    But we cannot use a plain named sub inside outer for a closure

    sub outer {
        ...
    
        sub inner { ... }  # misleading, likely misguided and buggy
    
        return \&inner;    # won't work correctly
    }
    

    This inner is made at compile time and is global so any variables it uses from outer will have their values baked from when outer was called the first time. So inner will be correct only until outer is called the next time -- when the lexical environment in outer gets remade but inner doesn't. As an example I can readily find this post, and see the entry in perldiag (or add use diagnostics; to the program).


    And in my view a poor-man's object in a way, as it has functionality and data, made elsewhere at another time and which can be used with data passed to it (and both can be updated)