Search code examples
perlperl-data-structures

What is the best data structure in Perl to store tabular data?


I have a table with the following data

1.1.1.1   routerA  texas
2.2.2.2   routerB  texas
3.3.3.3   routerC  california

What is the best data structure in Perl to store this data? I am thinking of storing in a hash of hash with the IP address as the key

1.1.1.1 
 routerA => texas,
2.2.2.2
 routerB => texas,
3.3.3.3
 routerC => california

But if I want to get all the IP addresses in texas, my data structure may not be flexible enough. Is there a better way to store this if I care about all IP addresses in Texas?


Solution

  • Pure Perl is definitely up to this task.

    Think of a table as an array of records. In Perl speak, that is an array of hash references. (An AoA may be applicable at times, remember TIMTOWTDI)

    The keys of each hash reference correspond to the column/field name and the values will be, well, the values for that particular record.

    Converting the OP's example to a data structure:

    my @data = (
                 {
                    ip     => '1.1.1.1',
                    router => 'routerA',
                    state  => 'texas',
                 },
                 {
                    ip     => '2.2.2.2',
                    router => 'routerB',
                    state  => 'texas',
                 },
                 {
                    ip     => '3.3.3.3',
                    router => 'routerA',
                    state  => 'california',
                 }
               );
    

    Now for the fun part:

    # Give me all IPs in Texas
    
    my @ips_in_texas = map $_->{ip},
                        grep { $_->{state} =~ /texas/i }
                         @data;
    
    # How many states does the data cover?
    
    use List::MoreUtils 'uniq';
    
    my $states_covered = uniq( map $_->{state}, @data );
    
    # How many unique IPs in each state?
    
    my %ips_by_state;
    $ips_by_state{ $_->{state} }{ $_->{ip} }++ for @data;
    print "'$_': ", scalar keys %{ $ips_by_state{$_} }, "\n" for keys %ips_by_state;
    

    The knee-jerk reaction I often get when I suggest this data structure centers around its hunger for memory. Frankly speaking, it won't be an issue unless you're dealing with millions of records. And if that is the case, a DBMS is the pencil-sharpening solution you seek, not Perl.