Search code examples
perlhashlanguage-featureshash-reference

What's the difference between a hash and hash reference in Perl?


I would like to properly understand hashes in Perl. I've had to use Perl intermittently for quite some time and mostly whenever I need to do it, it's mostly related to text processing.

And everytime, I have to deal with hashes, it gets messed up. I find the syntax very cryptic for hashes

A good explanation of hashes and hash references, their differences, when they are required etc. would be much appreciated.


Solution

  • A simple hash is close to an array. Their initializations even look similar. First the array:

    @last_name = (
      "Ward",   "Cleaver",
      "Fred",   "Flintstone",
      "Archie", "Bunker"
    );
    

    Now let's represent the same information with a hash (aka associative array):

    %last_name = (
      "Ward",   "Cleaver",
      "Fred",   "Flintstone",
      "Archie", "Bunker"
    );
    

    Although they have the same name, the array @last_name and the hash %last_name are completely independent.

    With the array, if we want to know Archie's last name, we have to perform a linear search:

    my $lname;
    for (my $i = 0; $i < @last_name; $i += 2) {
      $lname = $last_name[$i+1] if $last_name[$i] eq "Archie";
    }
    print "Archie $lname\n";
    

    With the hash, it's much more direct syntactically:

    print "Archie $last_name{Archie}\n";
    

    Say we want to represent information with only slightly richer structure:

    • Cleaver (last name)
      • Ward (first name)
      • June (spouse's first name)
    • Flintstone
      • Fred
      • Wilma
    • Bunker
      • Archie
      • Edith

    Before references came along, flat key-value hashes were about the best we could do, but references allow

    my %personal_info = (
        "Cleaver", {
            "FIRST",  "Ward",
            "SPOUSE", "June",
        },
        "Flintstone", {
            "FIRST",  "Fred",
            "SPOUSE", "Wilma",
        },
        "Bunker", {
            "FIRST",  "Archie",
            "SPOUSE", "Edith",
        },
    );
    

    Internally, the keys and values of %personal_info are all scalars, but the values are a special kind of scalar: hash references, created with {}. The references allow us to simulate "multi-dimensional" hashes. For example, we can get to Wilma via

    $personal_info{Flintstone}->{SPOUSE}
    

    Note that Perl allows us to omit arrows between subscripts, so the above is equivalent to

    $personal_info{Flintstone}{SPOUSE}
    

    That's a lot of typing if you want to know more about Fred, so you might grab a reference as sort of a cursor:

    $fred = $personal_info{Flintstone};
    print "Fred's wife is $fred->{SPOUSE}\n";
    

    Because $fred in the snippet above is a hashref, the arrow is necessary. If you leave it out but wisely enabled use strict to help you catch these sorts of errors, the compiler will complain:

    Global symbol "%fred" requires explicit package name at ...
    

    Perl references are similar to pointers in C and C++, but they can never be null. Pointers in C and C++ require dereferencing and so do references in Perl.

    C and C++ function parameters have pass-by-value semantics: they're just copies, so modifications don't get back to the caller. If you want to see the changes, you have to pass a pointer. You can get this effect with references in Perl:

    sub add_barney {
        my($personal_info) = @_;
    
        $personal_info->{Rubble} = {
            FIRST  => "Barney",
            SPOUSE => "Betty",
        };
    }
    
    add_barney \%personal_info;
    

    Without the backslash, add_barney would have gotten a copy that's thrown away as soon as the sub returns.

    Note also the use of the "fat comma" (=>) above. It autoquotes the string on its left and makes hash initializations less syntactically noisy.