Search code examples
arraysperlhashreferenceperl-data-structures

Deleting and adding Hash and Array references in Perl


Continuing on from my previous question, I ran into another problem down the road. I realized not only should there be hashes within hashes, there can also be arrays within hashes. So the paths would be something like

one/two/three
one/two[]/three
one/two/four
one/two[]/four 

i.e. the hash is supposed to contain the array will always have a [] as its suffix. According to the script I'm using (slightly modified version of the answer of my previous question), the above paths would result in:

one => {
     two => {
         three => "",
         four => "",
     }
     two[] => [
         {
             three => "",
             four => "",
         }
     ]
}

The script I'm using is:

# !/usr/bin/perl

use Data::Dumper; 

sub insert {
  my ($ref, $head, @tail) = @_;
  if ( @tail ) { 
    if( $head !~ /^(.*)(\[\])$/ ) {
        insert( \%{$ref->{$head}}, @tail );
    } else {
        my %newhash = ();
        unshift(@{$ref->{$1 . $2}}, %newhash);
        insert( \%{$ref->{$1 . $2}[0]}, @tail );
    }
  } else {
    $ref->{$head} = '';
  }
}

my %hash;
chomp and insert \%hash, split( '/', $_ ) while <>;

print Dumper %hash;

What I'd like to do, is once I find two[], I'd like to delete two and add it to the array of two[] (if two exists) and then rename key two[] to two.

So the end result would look something like:

one => {
    two => [
        {
            three => "",
            four => "",
        },
        {
            three => "",
            four => "",
        }
    ]
}

So I tried adding checks within the if else for checking keys with or without [] suffixes but I got a range or errors like [$variable] is not a valid HASH reference, etc. How would one go about checking the type of the variable (eg $ref->{$head} is array?) and deleting and renaming keys of the hash efficiently?

Thanks.


Solution

  • Okay, by all rights, this should suck AND not do what you want - But I spent the last hour trying to get it somewhat right, so I'll be damned. Each 'anything[]' is an array of two elements, each a hashref: One for elements that appear after a bare 'anything', and the second for elements appearing after a 'anything[]'. I probably should have used a closure instead of relying on that crappy $is_non_bracket variable -- I'll take another look in the morning when I'm less retarded and more ashamed of writing this.

    I think that it's tail-call optimized (the goto &SUB part). It also makes (small) use of named captures.

    use strict;
    use warnings;
    use 5.010;
    use Data::Dumper;
    
    sub construct {
        my $node = shift;
        return unless @_;
        my $next           = shift;
        my $is_non_bracket = 1;
    
        $next .= '[]' and $is_non_bracket-- if exists $node->{ $next . '[]' };
        if ( $next =~ / (?<node>[^\[\]]+) \Q[]/x ) {
            if ( exists $node->{ $+{node} } or not defined( $node->{$next} ) ) {
                push @{ $node->{$next} }, (delete $node->{ $+{node} } // {}); #/
             }
             unshift @_, $node->{$next}->[$is_non_bracket] ||= {};
        }
        else {
            $node->{$next} ||= @_ ? {} : $node->{$next};
            unshift @_, $node->{$next} //= @_ ? {} : ''; #/
        }
        goto &construct;
    }
    
    
    my %hash;
    
    while (<DATA>) {
        chomp;
        construct( \%hash, split m!/! );
    }
    
    say Dumper \%hash;
    
    __DATA__
    one/two/three
    one/two[]/three
    one/two[]/three/four
    one/two[]/three/four/five[]
    one/two[]/three/four/whatever
    one/two/ELEVAN
    one/three/sixteen
    one/three[]/whygodwhy
    one/three/mrtest/mruho
    one/three/mrtest/mruho[]/GAHAHAH
    

    EDIT: Regex had an extra space after the quotemeta that made it break down; My bad.

    EDIT2: Okay, it's the morning, edited in a version that isn't so stupid. No need for the ref, as we always pass a hashref; The #/ are there to stop the //'s from borking the highlighting.

    EDIT3: Just noticed you DON'T want those [] to show up in the data structure, so here's a version that doesn't show them:

    sub construct {
        my $node = shift;
        return unless @_;
        my $is_bracket = (my $next = shift) =~ s/\Q[]// || 0; 
    
        if (ref $node->{$next} eq 'ARRAY' or $is_bracket) {
            if ( ref $node->{ $next } ne 'ARRAY' ) {
                my $temp = delete $node->{ $next } || {};
                push @{ $node->{$next} = [] }, $temp;
             }
             unshift @_, $node->{$next}->[$is_bracket] ||= {};
        }
        else {
            $node->{$next} ||= @_ ? {} : $node->{$next};
            unshift @_, $node->{$next} //= @_ ? {} : ''; #/
        }
        goto &construct;
    }
    

    EDITNaN: Here's the gist of what it does: If there are enough arguments, we shift for a second time and put the value in the $next, which is promptly pulled into a substitution, which takes away its [], should it have any: If it does, the substitution returns 1, otherwise, s/// returns undef (or the empty string, I forget), so we use the logical-or to set the return value to 0; Either way, we set $is_bracket to this.

    Afterwards, if $node->{$next} is an arrayref or $next had brackets: If $node->{$next} wasn't an arrayref (so we got here because $next had brackets, and it was the first time this has happened), it' either undef, the empty string, or a hashref; We delete whatever it is, and store it in $temp. We then set the now-empty $node->{$next} to an arrayref, and set(push) $temp as its first element - Meaning that, for instance, if 'two' had existed previous, and $next was originally 'two[]', then 'two' will now point to an arrayref, and its old value will be stored in [0]. Once $node->{$next} is an arrayref (or if it already was), we unshift the hashref in the index pointed by $is_backet - 0 if $next didn't have brackets, and 1 if it did - to @_. If the hashref doesn't exist (either because it's undef, for both, or possibly the empty string, for 0), we assign it an brand new hashref with the logical-or.

    If it wasn't an arrayref, it's a hashref, so we do the same thing as before, and unshift the resulting value to @_.

    We do all this unshifting because the magic goto passes our current @_ to the function that will replace us.