Search code examples
perldata-structureshashset-difference

Find difference between two perl nested hashes


I'm trying to find difference in two files, that contains key/values entries, and return what all key/values are added or deleted. Currently, I'm using linux diff for finding the difference, but its natural that if values orders are changed, then it will be a valid diff, but I don't want to list them, because for me its invalid one.

file1:

key1    kamal1.google.com kamal2.google.com kamal3.google.com 
key2    kamal4.google.com 

file2:

key1    kamal1.google.com kamal6.google.com kamal3.google.com 
key3    kamal4.google.com

What I need:

  • Show deleted key2 with values kamal4.google.com, added key3 with kamal4.google.com, deleted kamal2.google.com from key1, added kamal6.google.com to key1
  • Message is representational, we can modify it to more meaningful one

What is my approach:

  • read the files and put in different hashes key1 => {kamal1.google.com => 1, ...}, key2 => {kamal4.google.com => 1}. I have taken array also as hash so as we to do diff efficiently.
  • Loop over keys of both hash and find if it exists in which hash.
  • Make a recursive call to find the diff in values (because its again a hash)

Problem with my code:
- Not working for nesting
- Lost track of parent.

Code:

my $file1 = 'file1';
my $file2 = 'file2';

my $old = hashifyFile($file1);
my $new = hashifyFile($file2);
my $result = {};
compareHashes($old , $new, $result);
print Dumper $result;

    sub compareHashes {
        my ($hash1, $hash2, $result) = @_;

            for my $key (keys %$hash1, keys %$hash2) {
                if (not exists $hash2->{$key}) {
                        push @{$result->{deleted}->{$key}}, keys %{$hash1->{$key}};
                } elsif (not exists $hash1->{$key}) {
                        push @{$result->{added}->{$key}}, keys %{$hash2->{$key}};
                } elsif (ref $hash1->{$key} eq 'HASH' or ref $hash2->{$key} eq 'HASH' ) {
                    compareHashes($hash1->{$key}, $hash2->{$key}, $result);
                }
            }
    }

# helper functions
sub trim {
   my $val = shift;
   $val =~ s/^\s*|\s*$//g;
   return $val;
}


sub hashifyFile {
    my $file = shift;
    my $contents = {};
    open my $file_fh, '<', $file or die "couldn't open $file $!";

    my ($key, @val);
    while (my $line = <$file_fh>) {
        # skip blank lines and comments
        next if $line =~ /^\s*$/;
        next if $line =~ /^#/;
        # print "$. $line";

        # if line starts with a word, means its "key values"
        # if it starts with multiple spaces assuming minimum 4, seems values for the previous key
        if ($line =~ /^\w/) {
            ($key, @val) = split /\s+|=/, $line;
        } elsif ($line =~ /^\s{4,}\w/) {
            push @val, split /\s+/, $line;
        }
        my %temp_hash;
        for (@val) {
                # next unless $_;
                $temp_hash{trim($_)} = 1 if trim($_);
        }
        $key = trim($key);
        $contents->{$key} = \%temp_hash if defined $key;

    }

    close $file_fh;
    return $contents;
}

Solution

  • Here is an example of how you can do it based on your description. Please clarify if this is what you wanted.

    sub compareHashes {
        my ($hash1, $hash2, $result, $parent) = @_;
    
        my %all_keys = map {$_ => 1} keys %$hash1, keys %$hash2;
    
        for my $key (keys %all_keys) {
            if (not exists $hash2->{$key}) {
                if ( defined $parent ) {
                    push @{$result->{deleted}->{$parent}}, $key;
                }
                else {
                    push @{$result->{deleted}->{$key}}, keys %{$hash1->{$key}};
                }
            } elsif (not exists $hash1->{$key}) {
                if ( defined $parent ) {
                    push @{$result->{added}->{$parent}}, $key;
                }
                else {
                    push @{$result->{added}->{$key}}, keys %{$hash2->{$key}};
                }
            }
            else {
                if ((ref $hash1->{$key} eq 'HASH') and (ref $hash2->{$key} eq 'HASH') ) {
                    compareHashes($hash1->{$key}, $hash2->{$key}, $result, $key);
                }
            }
        }
    }
    

    Output:

    $VAR1 = {
              'added' => {
                           'key3' => [
                                       'kamal4.google.com'
                                     ],
                           'key1' => [
                                       'kamal6.google.com'
                                     ]
                         },
              'deleted' => {
                             'key2' => [
                                         'kamal4.google.com'
                                       ],
                             'key1' => [
                                         'kamal2.google.com'
                                       ]
                           }
            };