Search code examples
perlhashnesteddereference

Deferencing hash of hashes in Perl


Sorry for this long post, the code should be easy to understand for veterans of Perl. I'm new to Perl and I'm trying to figure out this bit of code:

my %regression;

print "Reading regression dir: $opt_dir\n";

foreach my $f ( glob("$opt_dir/*.regress") ) {
    my $name = ( fileparse( $f, '\.regress' ) )[0];
    $regression{$name}{file} = $f;

    say "file $regression{$name}{file}";
    say "regression name $regression{$name}";
    say "regression name ${regression}{$name}";

    &read_regress_file( $f, $regression{$name} );
}


sub read_regress_file {
    say "args @_";

    my $file = shift;
    my $href = shift;

    say "href $href";

    open FILE, $file or die "Cannot open $file: $!\n";

    while ( <FILE> ) {
        next if /^\s*\#/ or /^\s*$/;
        chomp;

        my @tokens = split "=";
        my $key    = shift @tokens;
        $$href{$key} = join( "=", @tokens );
    }

    close FILE;
}

The say lines are things I added to debug.


My confusion is the last part of the subroutine read_regress_file. It looks like href is a reference from the line my $href = shift;. However, I'm trying to figure out how the hash that was passed got referenced in the first place.

%regression is a hash with keys of $name. The .regress files the code reads are simple files contains variables and their values in the form of:

var1=value
var2=value
...

So it looks like the line

my $name = (fileparse($f,'\.regress'))[0];

is creating the keys as scalars and the line

$regression{$name}{file} = $f;

actually makes $name into a hash.

In my debugging lines

say "regression name $regression{$name}";

prints the reference, for instance

regression name HASH(0x7cd198)

but

say "regression name ${regression}{$name}";

prints a name, like

regression name {filename}

with the file name inside the braces.

However, using

say "regression name $$regression{$name}";

prints nothing.

From my understanding, it looks like regression is an actual hash, but the references are the nested hashes, name.

Why does my deference test line using braces work, but the other form of dereferencing ($$) not work?

Also, why is the name still surrounded by braces when it prints? Shouldn't I be dereferencing $name instead?

I'm sorry if this is difficult to read. I'm confused which hash is actually referenced, and how to deference them if the reference is the nested hash.


Solution

  • This is a tough one. You've found some very awkward code that displays what may well be a bug in Perl, and you're getting confused over dereferencing Perl data structures. Standard Perl installations include the full set of documentation, and I suggest you take a look at perldoc perlreftut which is also available online at perldoc.com

    The most obvious thing is that you are writing very old-fashioned Perl. Using an ampersand & to call a Perl subroutine hasn't been considered good practice since v5.8 was released fourteen years ago

    I don't think there's much need to go beyond your clearly experimentatal lines at the start of the first for loop. Once you have understood this the rest should follow

    say "file $regression{$name}{file}";
    say "regression name $regression{$name}";
    say "regression name ${regression}{$name}";
    

    First of all, expanding data structure references within a string is unreliable. Perl tries to do what you mean, but it's very easy to write something ambiguous without realising it. It is often much better to use printf so that you can specify the embedded value separately. For instance

    printf "file %s\n", $regression{$name}{file};
    

    That said, you have a problem. $regression{$name} accesses the element of hash %regression whose key is equal to $name. That value is a reference to another hash, so the line

    say "regression name $regression{$name}";
    

    prints something like

    regression name HASH(0x29348b0)
    

    which you really don't want to see

    Your first try $regression{$name}{file} accesses the element of the secondary hash that has the key file. That works fine

    But ${regression}{$name} should be the same as $regression{$name}. Outside a string it is, but inside it's like ${regression} and {$name} are treated separately

    There are really too many issues here for me to start guessing where you're stuck, especially without being able to talk about specifics. But it may help if I rewrite the initial code like this

    my %regression;
    print "Reading regression dir: $opt_dir\n";
    
    foreach my $f ( glob("$opt_dir/*.pl") ) {
    
        my ($name, $path, $suffix) = fileparse($f, '\.regress');
    
        $regression{$name}{file} = $f;
        my $file_details = $regression{$name};
    
        say "file $file_details->{file}";
    
        read_regress_file($f, $file_details);
    }
    

    I've copied the hash reference to $file_details and passed it to the subroutine like that. Can you see that each element of %regression is keyed by the name of the file, and that each value is a reference to another hash that contains the values filled in by read_regress_file?

    I hope this helps. This isn't really a forum for teaching language basics so I don't think I can do much better