Search code examples
perlrecursionunfold

Deserializing a nested hash from lists of hash keys


I have a string that I would like to "unflatten" or "tree-ify"; that is, I want to go from this:

F=8|A_C=3|A_B=2|D_G_H=11|D_B=2|E=5  

to this:

{
  A => {
    B => 2,
    C => 3,
  },
  D => {
     B => 2,
     G => {
       H => 11,
     },
  },
  E => 5,
  F => 8,
}

My strategy was to process each pipe-delimited field separately and split by the = sign into a key / value pair:

sub unflatten {
    my ($data) = @_;
    my @fields = split /\|/, $data;
    my $result = {};

    for my $datum (@fields) {
            my ($key, $value) = split /=/, $datum;
            $result->{&processline($key)} = $value;
    }
    return $result;
}

I was attempting some recursive magic in the processline function:

sub processline {
    my ($key) = @_;

    my ($first, $rest) = split /_/, $key, 2; # split key into at most 2 parts
    if($rest) {
            return { $first => &processline($rest) }; 
            # if the key is nested, there will be something in $rest
            # so recursively process the smaller $rest, and build up the result hashref
    }
    else {
            return $first;
    }
}

Unfortunately, this doesn't work:

my $header = "F=8|A_C=3|A_B=2|D_G_H=11|D_B=2|E=5";
use Data::Dumper;
print Dumper &unflatten($header);

When I do this, I get:

$VAR1 = {
      'F' => '8',
      'HASH(0xe9af60)' => '2',
      'HASH(0xe9ae28)' => '11',
      'E' => '5',
      'HASH(0xe9af90)' => '3',
      'HASH(0xe9ae40)' => '2'
    };

Could someone explain the thought process behind a recursive solution, or suggest where my Perl has gone so wrong? Frustratingly, I was able to come up with the inverse of this function (flatten) pretty easily.


Solution

  • I believe this is more straightforward with a simple for loop than using recursion. The method you have chosen cannot work because it uses only a single key returned by processline to assign the value, and that cannot create a multilevel hash.

    The way a recursive solution would work is by taking a hash reference, a list of keys, and a value, and defining

    unflatten($hash, 'key1_key2_key3_key4', 'value')
    

    as

    unflatten($hash->{key1}, 'key2_key3_key4', 'value')`
    

    This program demonstrates a plain loop solution. It uses a pointer $hash that starts at the root of the resultant hash and moves forward a level after each key in the list.

    sub unflatten {
    
      my $result = {};
    
      for my $item (split /\|/, $_[0]) {
    
        my ($keys, $value) = split /=/, $item;
        my @keys = split /_/, $keys;
        my $hash = $result;
    
        while (@keys > 1) {
          my $key = shift @keys;
          $hash->{$key} ||= {};
          $hash = $hash->{$key};
        }
    
        $hash->{$keys[0]} = $value;
      }
    
      return $result;
    }
    

    output

    $VAR1 = {
          'A' => {
                   'C' => '3',
                   'B' => '2'
                 },
          'F' => '8',
          'D' => {
                   'G' => {
                            'H' => '11'
                          },
                   'B' => '2'
                 },
          'E' => '5'
        };
    

    Update

    Now that I'm back at a keyboard, here's a recursive solution. It results in an identical hash to the original

    use strict;
    use warnings;
    
    use Data::Dumper;
    
    my $data = 'F=8|A_C=3|A_B=2|D_G_H=11|D_B=2|E=5';
    
    my $result = {};
    unflatten2($result, $_) for split /\|/, $data;
    print Dumper $result;
    
    sub unflatten2 {
      my ($hash, $data) = @_;
    
      if ($data =~ /_/) {
        my ($key, $rest) = split /_/, $data;
        unflatten2($hash->{$key} ||= {}, $rest);
      }
      else {
        my ($key, $val) = split /=/, $data;
        $hash->{key} = $val;
      }
    }
    

    Update

    You may also be interested in the Data::Diver module, which is intended for situations like this, although the documentation is a little clumsy

    Here's how a solution using it would look

    use strict;
    use warnings;
    
    use Data::Diver qw/ DiveVal /;
    use Data::Dumper;
    
    my $data = 'F=8|A_C=3|A_B=2|D_G_H=11|D_B=2|E=5';
    
    my $result = {};    
    
    for (split /\|/, $data) {
      my ($keys, $val) = split /=/;
      DiveVal($result, split /_/, $keys) = $val;
    }
    
    print Dumper $result;