Hash keys behavior

perl -Mstrict -wlE 'my %h; say grep 0, $h{poluted}; say keys %h'




perl -Mstrict -wlE 'my %h; say grep 0, my @r= $h{poluted}; say keys %h'

gives no output.

I would like to know why outputs are different?


  • Aliasing

    In Perl's looping constructs map, grep and for, the $_ variable is aliased to each current item. While the $_ may be read-only, it always represents a valid scalar value.

    For example, the following code dies:

    $_ = 1 for 1, 2, 3;  # constants are read-only

    but this works:

    my @nums = (1, 2, 3);
    $_ = 1 for @nums;  # @nums isn't read-only

    Notice that assignments perform a copy, but an alias associates a name with an existing scalar.

    The two undef values

    Perl has two kinds of undef:

    • A scalar may be set to represent undef. For example:

      my $foo;  # is this kind of undef
      $foo = 1; # isn't undef any more
    • A special globally unique scalar that represents an readonly undef value, e.g. returned when you access an uninitialized array index in an rvalue context. In the Perl API, this is &PL_sv_undef. You can obtain a reference to this value, e.g. \undef, and can alias a variable to it.

    The two ways of accessing a hash value

    Internally, hash entries are fetched with hv_fetch or hv_fetch_ent. As arguments, both take a hash, a key, and a flag telling them whether the access is read-only.

    If this is a read-only access and the element doesn't exist, a null pointer will be returned, which manifests itself as the undef value in Perl space. This undef value is not connected to the hash. Ergo, not exists $hash{foo} implies not defined $hash{foo}.

    But if it isn't read-only and the element doesn't exist, a new entry is created, which is then returned. However, this entry is initially undef, until it is set to another value via an assignment.

    So why doesn't the code in the question work as expected?

    grep 0, $h{polluted}

    Argument lists for looping constructs are aliased to $_. If the expressions in the list are constants or subroutines, then nothing spectacular happens. But when they are variable accesses, then this implies a read-write access.

    So, to obtain the value of $h{polluted}, Perl apparently does an access in read-write mode. If we look at the opcodes for this expression, we do in fact see:

    3  <0> pushmark s
    4  <#> gv[*h] s
    5  <1> rv2hv sKR/1
    6  <$> const[PV "polluted"] s/BARE
    7  <2> helem sKM/2                # <-- hash element access, "M" flag is set!
    8  <@> grepstart K
    9  <|> grepwhile(other->a)[t2] vK
    a      <$> const[IV 0] s
               goto 9

    The M stands for MOD, which means an lvalue/read-write access.

    Why does this behavior make “sense”

    In for-loops, having $_ be an alias to the current element can be genuinly useful. In map and grep, this is a performance hack to avoid the copy of a whole scalar. Aliasing is much cheaper, as this only implies the copy of a single pointer.