Search code examples
perldata-dumper

Unexpected output from Perl Dumper


I was trying to assign one hash to another when I came across this unexpected situation.

I was printing the dumper to validate if the hash is formed correctly.

The Data::Dumper does provide the expected output when I iterate though the hash but it show some unexpected result when I print the entire hash.

Please see the code snippet below. Any insight will be a great help.

my (@aBugs) = (111,222,333);
my $phBugsRec;
my $phProfiles;
$phProfiles->{profiles} = { 'profile1' => 'default1' };

Forming the final hash:

foreach my $pBugNo(@aBugs){
    $phBugsRec->{bugAttributes}{$pBugNo}{totalEffort} = 0;
    $phBugsRec->{bugAttributes}{$pBugNo}{profiles}    = $phProfiles->{profiles};
}

If I dump the entire hash, I am not getting the expected output:

print '<pre>'.Dumper($phBugsRec).'</pre>';

$VAR1 = {
    'bugAttributes' => {
        '333' => {
            'totalEffort' => 0,
            'profiles' => {
                'profile1' => 'default1'
            }
        },
        '111' => {
            'totalEffort' => 0,
            'profiles' => $VAR1->{'bugAttributes'}{'333'}{'profiles'}
        },
        '222' => {
            'totalEffort' => 0,
            'profiles' => $VAR1->{'bugAttributes'}{'333'}{'profiles'}
        }
    }
};

But when I iterate over the hash, I get the expected output

foreach (sort keys $phBugsRec->{bugAttributes}){
    print '<pre>'.$_.':'.Dumper($phBugsRec->{bugAttributes}{$_}).'</pre>';
}

111:$VAR1 = {
  'totalEffort' => 0,
  'profiles' => {
    'profile1' => 'default1'
  }
};
222:$VAR1 = {
  'totalEffort' => 0,
  'profiles' => {
    'profile1' => 'default1'
  }
};
333:$VAR1 = {
  'totalEffort' => 0,
  'profiles' => {
    'profile1' => 'default1'
  }
};

Solution

  • As tripleee says in their comment, this is not wrong. I agree it might be unexpected. It happens because you are using the same reference several times in a data structure. It's due to how Perl's references work.

    A short overview of references in Perl

    References are explained in perlref, perlreftut, perldsc and perllol.

    As soon as you have a data structure with more than one level in Perl, all levels after the first level are stored as references. The -> operator is used to dereference them. Perls turns them back into hashes or arrays. You are basically traversing a data structure if you say $foo->{bar}->{baz} to get to the inner value.

    If you set $foo->{bar}->{baz} = 123 directly, Perl will create all those references for you automagically. But you can also make references yourself.

    my @numbers = (42, 23, 1337);
    my $ref = \@numbers;
    
    print Dumper $ref;
    
    __END__
    $VAR1 = [ 42, 23, 1337 ]
    

    This is a single reference to that array. If you use it more than once in the same data structure, it will show that.

    my $hash = { 
        foo => $ref, 
        bar => $ref,
    };
    
    __END__
    
    $VAR1 = {
          'foo' => [
                     42,
                     23,
                     1337
                   ],
          'bar' => $VAR1->{'foo'}
    };
    

    Looks the same as your example, right? Let's try something else. If you print a reference in scalar context, Perl will tell you its address.

    print "$ref";
    
    __END__
    ARRAY(0x25df7b0)
    

    We've all seen that, and we've all thought something is seriously wrong when we first saw it. Let's go back to our $hash from above.

    say $hash->{foo};
    say $hash->{bar};
    
    __END__
    ARRAY(0x16257b0)
    ARRAY(0x16257b0)
    

    As you can see, it's the same address, because it's the same data structure.

    Other Perl serializers

    This is what your data structure looks with Data::Dump.

    do {
      my $a = {
        bugAttributes => {
          111 => { profiles => { profile1 => "default1" }, totalEffort => 0 },
          222 => { profiles => 'fix', totalEffort => 0 },
          333 => { profiles => 'fix', totalEffort => 0 },
        },
      };
      $a->{bugAttributes}{222}{profiles} = $a->{bugAttributes}{111}{profiles};
      $a->{bugAttributes}{333}{profiles} = $a->{bugAttributes}{111}{profiles};
      $a;
    }
    1
    

    Data::Dump is meant for creating output that is both human readable and can be put back into Perl. It's a bit more concise than Data::Dumper. You can see that it also shows the values that are used more than once in your data structure.

    And this is what Data::Printer does to it.

    \ {
        bugAttributes   {
            111   {
                profiles      {
                    profile1   "default1"
                },
                totalEffort   0
            },
            222   {
                profiles      var{bugAttributes}{111}{profiles},
                totalEffort   0
            },
            333   {
                profiles      var{bugAttributes}{111}{profiles},
                totalEffort   0
            }
        }
    }
    

    Data::Printer is meant for human consumption only. You cannot run this as code, but instead it's meant to be easily readable. Again, it also shows that stuff is reused inside of the data structure.

    The conclusion from all of this is that those serializers do that because it is not easy to show that something is reused. Not even when you say it in Perl.

    Why you can't see the whole data structure

    If Perl would omit the fact that there are parts of the data structure that have been reused, the serialization would not be reversible. The result of reading it back in would be something else. That's of course not what you would do.

    Serialization without reusing

    To show that your data is in fact not lost and this is really just a way to show (and port) that things are reused inside of the data structure, I have converted it to JSON using the JSON module, which is a portable format that can be used with Perl, but that is not Perl.

    use JSON 'encode_json';
    say JSON->new->pretty->encode( $phBugsRec);
    

    Here is the result. It looks more like what you expected.

    {
       "bugAttributes" : {
          "333" : {
             "profiles" : {
                "profile1" : "default1"
             },
             "totalEffort" : 0
          },
          "111" : {
             "totalEffort" : 0,
             "profiles" : {
                "profile1" : "default1"
             }
          },
          "222" : {
             "profiles" : {
                "profile1" : "default1"
             },
             "totalEffort" : 0
          }
       }
    }
    

    That's because JSON is meant to be a portable format. It's for moving data around. There is an agreement on what it can contain, and reusing data ist not part of that. Not every language that implements reading and writing JSON supports reuse of partial data structures1.

    It would also just be printed twice if we would convert to YAML or XML.

    1) I don't have proof for that, but it gets the point accross