Search code examples
perlhashreferenceperl-data-structures

Storing multiple values in one key in a hash of hash using Perl


I am trying to create a data structure to store data I am pulling off a database:

$Interaction{$TrGene}={
CisGene => $CisGene,
E => $e,
Q => $q,};

A single $TrGene is associated to a number of CisGenes (which has a unique E &Q). e.g:

TrGene1 CisGene1 Q1 E2

TrGene1 CisGene2 Q2 E3

The last TrGene1 overwrites those that came before it. I think that I need to create a reference to an array, but don't fully understand how this should be done after reading this webpage: http://perldoc.perl.org/perlreftut.html

I have attempted to use the country/city example on that webpage, but have not been very successful:

$Interaction{$TrGene}={
CisGene => $CisGene,
E => $e,
Q => $q,};
push @{$Interaction{$TrGene}}, $CisGene;

I get an error 'Not an ARRAY ref'. I have also only used $CisGene there, however it needs to not overwrite the E & Q values for that CisGene. (so will this hash know that the CisGene is associated to a specific E and Q, or do I need to create another layer of hash for that?)

Thanks


Solution

  • Your code, but properly indented so it's readable.

    $Interaction{$TrGene} = {
        CisGene => $CisGene,
        E       => $e,
        Q       => $q,
    };
    push @{$Interaction{$TrGene}}, $CisGene;
    

    Code explained:

    You assign a list of key-value pairs to an anonymous hash, using curly brackets {}, and you assign that hash reference to the $TrGene key in the %Interaction hash. Then you try to use that value as an array reference by surrounding it with @{ ... }, which does not work.

    If you enter a hash key with different values, you will overwrite them. Let's take some practical examples, it's really quite easy.

    $Interaction{'foobar'} = {
        CisGene => 'Secret code',
        E       => 'xxx',
        Q       => 'yyy',
    };
    

    Now you've stored a hash reference under the key 'foobar'. That hash is actually a free standing reference to a data structure. I think it's easier to keep track of structures if you think of them as scalars: A hash (or array) can only ever contain scalars.

    The hash %Interaction may contain a number of keys, and if you have entered data like above, all the values will be hash references. E.g.:

    $hash1 = {  # note: curly brackets denote an anonymous hash 
        CisGene => 'Secret code',
        E       => 'xxx',
        Q       => 'yyy',
    };
    $hash2 = {
        CisGene => 'some other value',
        E       => 'foo',
        Q       => 'bar',
    };
    
    %Interaction = ( # note: regular parenthesis denote a list
        'foobar'   => $hash1,  # e.g. CisGene => 'Secret code', ... etc. from above
        'barbar'   => $hash2   # e.g. other key value pairs surrounded by {}
        ...
    );
    

    The type of value contained in $hash1 and $hash2 is now a reference, an address to data in memory. If you print it out print $hash1, you will see something like HASH(0x398a64).

    Now, if you enter a new value into %Interaction, using an existing key, that key will be overwritten. Because a hash key can only ever contain one value: A scalar. In our case, a reference to a hash.

    What you are trying to do in your example is using the value of the 'foobar' key as an array reference (which is silly, because as you now can see above, it's a hash reference):

    push @{$Interaction{$TrGene}}, $CisGene;
    

    Rewritten:

    push @{  $hash1  }, 'Secret code';  # using the sample values from above
    

    No... that doesn't work.

    What you need is a new container. We'll make the value of the key 'foobar' an array reference instead:

    %Interaction = (
        'foobar'   => $array1,
        ...
    );
    

    Where:

    $array1 = [ $hash1, $hash2 ];
    

    or

    $array1 = [       # note the square brackets to create anonymous array
                  {   # curly brackets for anonymous hash
                      CisGene => 'Secret code',
                      E       => 'xxx',
                      Q       => 'yyy',
                  },  # comma sign to separate array elements
                  {   # start a new hash
                      CisGene => 'Some other value',
                      E       => 'foo',
                      Q       => 'bar',
                  }   # end 
               ];     # end of $array1
    

    Now, this is all rather a cumbersome way to put things, so lets make it simpler:

    $CisGene = 'foobar';
    $e = 'xxx';
    $q = 'yyy';
    
    my $hash1 = {
            CisGene => $CisGene,
            E       => $e,
            Q       => $q,
    };
    
    push @{$Interaction{$TrGene}}, $hash1;
    

    Or you can do away with the temp variable $hash1 and assign it directly:

    push @{$Interaction{$TrGene}}, {
        CisGene => $CisGene,
        E       => $e,
        Q       => $q,
    };
    

    And when accessing the elements:

    for my $key (keys %Interaction) {  # lists the $TrGene keys 
        my $aref = $Interaction{$key}; # the array reference
        for my $hashref (@$aref) {     # extract hash references, e.g. $hash1
            my $CisGene = $hashref->{'CisGene'};
            my $e       = $hashref->{'E'};
            my $q       = $hashref->{'Q'};
        }
    }
    

    Note the use of the arrow operator when dealing directly with references. You can also say $$hashref{'CisGene'}.

    Or directly:

    my $CisGene = $Interaction{'foobar'}[0]{'CisGene'};
    

    I recommend reading perldata. A very handy module is Data::Dumper. If you do:

    use Data::Dumper;
    print Dumper \%Interaction; # note the backslash, Dumper wants references
    

    It will print out your data structure for you, which makes it very easy to see what you are doing. Make note of it's use of brackets and curly brackets to denote arrays and hashes.