Search code examples
perlperl-data-structures

Perl: Generating Arrays inside a Complex Hash


In the quest to make my data more accessible, I want to store my tabulated data in a complex hash. I am trying to grow a 'HoHoHoA' as the script loops over my data. As per the guidelines in 'perldsc':

push @ { $hash{$column[$i]}{$date}{$hour} }, $data[$i];

The script compiles and runs without a problem, but doesn't not add any data to the hash:

print $hash{"Frequency Min"}{"09/07/08"}{"15"}; 

returns nothing even though the keys should exist. Running an 'exists' on the hash shows that it does not exist.

The data file that I am reading looks like this:

DATE       TIME     COLUMN1 COLUMN2 COLUMN3...    
09/06/2008 06:12:56 56.23   54.23   56.35...
09/06/2008 06:42:56 56.73   55.28   54.52...
09/06/2008 07:12:56 57.31   56.79   56.41...
09/06/2008 07:42:56 58.24   57.30   58.86...
.
.
.

I want to group together the values of each column in an array for any given date and hour, hence the three hashes for {COLUMN}, {DATE} and {HOUR}.

The resultant structure will look like this:

%monthData = (
               "COLUMN1" => {
                                    "09/06/2008" => {
                                                      "06" => [56.23,56.73...],
                                                      "07" => [57.31,58.24...]
                                                    }
                            },
               "COLUMN2" => {
                                    "09/06/2008" => {
                                                      "06" => [54.23,55.28...],
                                                      "07" => [56.79,57.30...]
                                                    }
                            },
               "COLUMN3" => {
                                    "09/06/2008" => {
                                                      "06" => [56.35,54.52...],
                                                      "07" => [56.41,58.86...]
                                                    }
                            }
             );

Take a look at my code:

use feature 'switch';
open DATAFILE, "<", $fileName or die "Unable to open $fileName !\n";

    my %monthData;

    while ( my $line = <DATAFILE> ) {

        chomp $line;

        SCANROWS: given ($row) {

            when (0) { # PROCESS HEADERS

                @headers = split /\t\t|\t/, $line;
            }

            default {

                @current = split /\t\t|\t/, $line;
                my $date =  $current[0];
                my ($hour,$min,$sec) = split /:/, $current[1];

                # TIMESTAMP FORMAT: dd/mm/yyyy\t\thh:mm:ss

                SCANLINE: for my $i (2 .. $#headers) {

                    push @{ $monthData{$headers[$i]}{$date}{$hour} }, $current[$i];

                }
            }
        }
    }

    close DATAFILE;

    foreach (@{ $monthData{"Active Power N Avg"}{"09/07/08"}{"06"} }) {
        $sum += $_;
        $count++;
    }

    $avg = $sum/$count; # $sum and $count are not initialized to begin with.
    print $avg; # hence $avg is also not defined.

Hope my need is clear enough. How can I append values to an array inside these sub-hashes?


Solution

  • This should do it for you.

    #!/usr/bin/perl
    
    use strict;
    use warnings;
    
    use List::Util qw/sum/;
    sub avg { sum(@_) / @_ }
    
    my $fileName = shift;
    
    open my $fh, "<", $fileName
        or die "Unable to open $fileName: $!\n";
    
    my %monthData;
    
    chomp(my @headers = split /\t+/, <$fh>);
    
    while (<$fh>) {
        chomp;
        my %rec;
        @rec{@headers} = split /\t+/;
        my ($hour) = split /:/, $rec{TIME}, 2;
    
        for my $key (grep { not /^(DATE|TIME)$/ } keys %rec) {
            push @{ $monthData{$key}{$rec{DATE}}{$hour} }, $rec{$key};
        }
    }
    
    for my $column (keys %monthData) {
        for my $date (keys %{ $monthData{$column} }) {
            for my $hour (keys %{ $monthData{$column}{$date} }) {
                my $avg = avg @{ $monthData{$column}{$date}{$hour} };
                print "average of $column for $date $hour is $avg\n";
            }
        }
    }
    

    Things to pay attention to:

    • strict and warnings pragmas
    • List::Util module to get the sum function
    • putting an array in scalar context to get the number of items in the array (in the avg function)
    • the safer three argument version of open
    • the lexical filehandle (rather than the old bareword style filehandle)
    • reading the headers first outside the loop to avoid having to have special logic inside it
    • using a hash slice to get the file data into a structured record
    • avoiding splitting the time more than necessary with the third argument to split
    • avoiding useless variables by only specifying the variable we want to catch in the list assignment
    • using grep to prevent the DATE and TIME keys from being put in %monthData
    • the nested for loops each dealing with a level in the hash