Search code examples
perlhashmergeperl-data-structures

How can I merge several hashes into one hash in Perl?


In Perl, how do I get this:

$VAR1 = { '999' => { '998' => [ '908', '906', '0', '998', '907' ] } }; 
$VAR1 = { '999' => { '991' => [ '913', '920', '918', '998', '916', '919', '917', '915', '912', '914' ] } }; 
$VAR1 = { '999' => { '996' => [] } }; 
$VAR1 = { '999' => { '995' => [] } }; 
$VAR1 = { '999' => { '994' => [] } }; 
$VAR1 = { '999' => { '993' => [] } }; 
$VAR1 = { '999' => { '997' => [ '986', '987', '990', '984', '989', '988' ] } }; 
$VAR1 = { '995' => { '101' => [] } }; 
$VAR1 = { '995' => { '102' => [] } }; 
$VAR1 = { '995' => { '103' => [] } }; 
$VAR1 = { '995' => { '104' => [] } }; 
$VAR1 = { '995' => { '105' => [] } }; 
$VAR1 = { '995' => { '106' => [] } }; 
$VAR1 = { '995' => { '107' => [] } }; 
$VAR1 = { '994' => { '910' => [] } }; 
$VAR1 = { '993' => { '909' => [] } }; 
$VAR1 = { '993' => { '904' => [] } }; 
$VAR1 = { '994' => { '985' => [] } }; 
$VAR1 = { '994' => { '983' => [] } }; 
$VAR1 = { '993' => { '902' => [] } }; 
$VAR1 = { '999' => { '992' => [ '905' ] } }; 

to this:

$VAR1 = { '999:' => [
 { '992' => [ '905' ] },
 { '993' => [
  { '909' => [] },
  { '904' => [] },
  { '902' => [] }
 ] },
 { '994' => [
  { '910' => [] },
  { '985' => [] },
  { '983' => [] }
 ] },
 { '995' => [
  { '101' => [] },
  { '102' => [] },
  { '103' => [] },
  { '104' => [] },
  { '105' => [] },
  { '106' => [] },
  { '107' => [] }
 ] },
 { '996' => [] },
 { '997' => [ '986', '987', '990', '984', '989', '988' ] },
 { '998' => [ '908', '906', '0', '998', '907' ] },
 { '991' => [ '913', '920', '918', '998', '916', '919', '917', '915', '912', '914' ] }
]};

Solution

  • I think this is closer than anybody else has gotten:

    This does most of what you want. I did not store things in arrays of singular hashes, as I don't feel that that is useful.

    Your scenario is not a regular one. I've tried to genericize this to some extent, but was not possible to overcome the singularity of this code.

    • First of all because it appears you want to collapse everything with the same id into a merged entity (with exceptions), you have to descend through the structure pulling the definitions of the entities. Keeping track of levels, because you want them in the form of a tree.

    • Next, you assemble the ID table, merging entities as possible. Note that you had 995 defined as an empty array one place and as a level another. So given your output, I wanted to overwrite the empty list with the hash.

    • After that, we need to move the root to the result structure, descending that in order to assign canonical entities to the identifiers at each level.

    Like I said, it's not anything that regular. Of course, if you still want a list of hashes which are no more than pairs, that's an exercise left to you.

    use strict;
    use warnings;
    
    # subroutine to identify all elements
    sub descend_identify {
        my ( $level, $hash_ref ) = @_;
        # return an expanding list that gets populated as we desecend 
        return map {
            my $item = $hash_ref->{$_};
            $_ => ( $level, $item )
                , ( ref( $item ) eq 'HASH' ? descend_identify( $level + 1, $item ) 
                  :                          ()
                  )
               ;
        } keys %$hash_ref
        ;
    }
    
    # subroutine to refit all nested elements
    sub descend_restore { 
        my ( $hash, $ident_hash ) = @_;
    
        my @keys        = keys %$hash;
        @$hash{ @keys } = @$ident_hash{ @keys };
        foreach my $h ( grep { ref() eq 'HASH' } values %$hash ) {
            descend_restore( $h, $ident_hash );
        }
        return;
    }
    
    # merge hashes, descending down the hash structures.
    sub merge_hashes {
        my ( $dest_hash, $src_hash ) = @_;
        foreach my $key ( keys %$src_hash ) {
            if ( exists $dest_hash->{$key} ) {
                my $ref = $dest_hash->{$key};
                my $typ = ref( $ref );
                if ( $typ eq 'HASH' ) {
                    merge_hashes( $ref, $src_hash->{$key} );
                }
                else { 
                    push @$ref, $src_hash->{$key};
                }
            }
            else {
                $dest_hash->{$key} = $src_hash->{$key};
            }
        }
        return;
    }
    
    my ( %levels, %ident_map, %result );
    
    #descend through every level of hash in the list
    # @hash_list is assumed to be whatever you Dumper-ed.
    my @pairs = map { descend_identify( 0, $_ ); } @hash_list;
    
    while ( @pairs ) {
        my ( $key, $level, $ref ) = splice( @pairs, 0, 3 );
        $levels{$key} |= $level;
    
        # if we already have an identity for this key, merge the two
        if ( exists $ident_map{$key} ) {
            my $oref = $ident_map{$key};
            my $otyp = ref( $oref );
            if ( $otyp ne ref( $ref )) {
                # empty arrays can be overwritten by hashrefs -- per 995
                if ( $otyp eq 'ARRAY' && @$oref == 0 && ref( $ref ) eq 'HASH' ) {
                    $ident_map{$key} = $ref;
                }
                else { 
                    die "Uncertain merge for '$key'!";
                }
            }
            elsif ( $otyp eq 'HASH' ) {
                merge_hashes( $oref, $ref );
            }
            else {
                @$oref = sort { $a <=> $b || $a cmp $b } keys %{{ @$ref, @$oref }};
            }
        }
        else {
            $ident_map{$key} = $ref;
        }
    }
    
    # Copy only the keys that do not appear at higher levels to the 
    # result hash
    if ( my @keys = grep { !$levels{$_} } keys %ident_map ) { 
        @result{ @keys } = @ident_map{ @keys } if @keys;
    
    }
    # then step through the hash to make sure that the entries at
    # all levels are equal to the identity
    descend_restore( \%result, \%ident_map );