Search code examples
perlreference

Perl array of hash is adding elements from other arrays


I'm creating an array of hash in perl 5.34.0:

#!/usr/bin/env perl

use strict;
use warnings FATAL => 'all';
use autodie ':default';
use DDP;

my (@a1, @a2);
my %h = ('a' => 1);
push @a1, \%h; # make @a1 a hash of array with 'a' defined
$h{b} = 2; # define b within the hash, but don't write to @a1
push @a2, \%h; # push `a` and `b` onto @a2, but *not* @a1
p @a1; # DDP gives "p" which pretty-prints
p @a2;

and this outputs:

[
    [0] {
            a   1,
            b   2
        }
]
[
    [0] {
            a   1,
            b   2
        }
]

the problem is that the b key is showing up in @a1, yet $h{b} didn't exist when data was being written to @a1.

I don't want b to show up in @a1 and, it shouldn't.

How can I modify %h so that it doesn't magically appear in different arrays?


Solution

  • That code adds a reference to an existing (named) hash,

    push @a1, \%h;
    

    so when that is later queried you see whatever you'd see in the hash at that time. The array only carries a pointer of sorts with the address of the data, also referred to by the hash; it's an alias of sorts. So if the hash got written to in the meanwhile then that's what you'll see via @a1.

    Add it again and, even if the hash changed, you just add the same old reference.

    What you want is to make a data copy and add a reference to that -- an anonymous hash

    push @a1, { %h };    # but may need deep copy instead
    

    Now the hash data gets copied in order to populate an anonymous hash constructed by {} and we have independent data that can be changed only by writing to it via its reference in @a1.

    But note that if values in that hash themselves are references then those references get copied and we have the same problem! In that case you need a deep copy, and this is best done with libraries; Storable::dclone is a good one

    use Storable qw(dclone);
    
    push @a1, dclone \%h;
    

    Now all of the actual data is copied, for (a reference to) an independent data copy on @a1


    An important exception, which is frequently used

    foreach my $elem (@ary) { 
        my %h;
        # ... code that does its work and populates the hash ...
        push @res, \%h;
    }
    

    Now this is OK, because the hash %h gets created anew at every iteration, being declared inside the loop. So what happens with the data created at the previous iteration?

    Since its reference was added to an array the data itself is kept, being referred to by that reference in the array, and only by that. Precisely what you want, lone data accessible only via the array.

    In such a case this is preferred to push @res, { %h }, which also works, since a data copy is avoided while the data is kept by, so to say, merely changing its ownership.


    And ff data is changed via @a1, like $a1[0]->{key} = 'val';, then that new value is seen via $h{key} as well.