Search code examples
rubysetdata-comparison

Ruby - Show Deltas Between 2 array of hashes based on subset of hash keys


I'm attempting to compare two arrays of hashes with very similar hash structure (identical and always-present keys) and return the deltas between the two--specifically, I'd like to capture the folllowing:

  • Hashes part of array1 that do not exist in array2
  • Hashes part of array2 that do not exist in array1
  • Hashes which appear in both data sets

This typically can be achieved by simply doing the following:

deltas_old_new = (array1-array2)
deltas_new_old = (array2-array1)

The problem for me (which has turned into a 2-3 hour struggle!) is that I need to identify the deltas based on the values of 3 keys within the hash ('id', 'ref', 'name')--the values of these 3 keys are effectively what makes up a unique entry in my data -- but I must retain the other key/value pairs of the hash (e.g. 'extra' and numerous other key/value pairs not shown for brevity.

Example Data:

array1 = [{'id' => '1', 'ref' => '1001', 'name' => 'CA', 'extra' => 'Not Sorted On 5'},
          {'id' => '2', 'ref' => '1002', 'name' => 'NY', 'extra' => 'Not Sorted On 7'},
          {'id' => '3', 'ref' => '1003', 'name' => 'WA', 'extra' => 'Not Sorted On 9'},
          {'id' => '7', 'ref' => '1007', 'name' => 'OR', 'extra' => 'Not Sorted On 11'}]

array2 = [{'id' => '1', 'ref' => '1001', 'name' => 'CA', 'extra' => 'Not Sorted On 5'},
          {'id' => '3', 'ref' => '1003', 'name' => 'WA', 'extra' => 'Not Sorted On 9'},
          {'id' => '8', 'ref' => '1002', 'name' => 'NY', 'extra' => 'Not Sorted On 7'},
          {'id' => '5', 'ref' => '1005', 'name' => 'MT', 'extra' => 'Not Sorted On 10'},
          {'id' => '12', 'ref' => '1012', 'name' => 'TX', 'extra' => 'Not Sorted On 85'}]

Expected Outcome (3 separate array of hashes):

Object containing data in array1 but not in array2 --

[{'id' => '2', 'ref' => '1002', 'name' => 'NY', 'extra' => 'Not Sorted On 7'},
 {'id' => '7', 'ref' => '1007', 'name' => 'OR', 'extra' => 'Not Sorted On 11'}]

Object containing data in array2 but not in array1 --

[{'id' => '8', 'ref' => '1002', 'name' => 'NY', 'extra' => 'Not Sorted On 7'},
 {'id' => '5', 'ref' => '1005', 'name' => 'MT', 'extra' => 'Not Sorted On 10'},
 {'id' => '12', 'ref' => '1012', 'name' => 'TX', 'extra' => 'Not Sorted On 85'}]

Object containing data in BOTH array1 and array2 --

[{'id' => '1', 'ref' => '1001', 'name' => 'CA', 'extra' => 'Not Sorted On 5'},
 {'id' => '3', 'ref' => '1003', 'name' => 'WA', 'extra' => 'Not Sorted On 9'}]

I've tried numerous attempts at comparing iterating over the arrays and using Hash#keep_if based on the 3 keys as well as merging both data sets into a single array and then attempting to de-dup based on array1 but I keep coming up empty handed. Thank you in advance for your time and assistance!


Solution

  • This isn't very pretty, but it works. It creates a third array containing all unique values in both array1 and array2 and iterates through that.

    Then, since include? doesn't allow a custom matcher, we can fake it by using detect and looking for an item in the array which has the custom sub-hash matching. We'll wrap that in a custom method so we can just call it passing in array1 or array2 instead of writing it twice.

    Finally, we loop through our array3 and determine whether the item came from array1, array2, or both of them and add to the corresponding output array.

    array1 = [{'id' => '1', 'ref' => '1001', 'name' => 'CA', 'extra' => 'Not Sorted On 5'},
              {'id' => '2', 'ref' => '1002', 'name' => 'NY', 'extra' => 'Not Sorted On 7'},
              {'id' => '3', 'ref' => '1003', 'name' => 'WA', 'extra' => 'Not Sorted On 9'},
              {'id' => '7', 'ref' => '1007', 'name' => 'OR', 'extra' => 'Not Sorted On 11'}]
    
    array2 = [{'id' => '1', 'ref' => '1001', 'name' => 'CA', 'extra' => 'Not Sorted On 5'},
              {'id' => '3', 'ref' => '1003', 'name' => 'WA', 'extra' => 'Not Sorted On 9'},
              {'id' => '8', 'ref' => '1002', 'name' => 'NY', 'extra' => 'Not Sorted On 7'},
              {'id' => '5', 'ref' => '1005', 'name' => 'MT', 'extra' => 'Not Sorted On 10'},
              {'id' => '12', 'ref' => '1012', 'name' => 'TX', 'extra' => 'Not Sorted On 85'}]
    
    # combine the arrays into 1 array that contains items in both array1 and array2 to loop through
    array3 = (array1 + array2).uniq { |item| { 'id' => item['id'], 'ref' => item['ref'], 'name' => item['name'] } }
    
    # Array#include? doesn't allow a custom matcher, so we can fake it by using Array#detect
    def is_included_in(array, object)
      object_identifier = { 'id' => object['id'], 'ref' => object['ref'], 'name' => object['name'] }
    
      array.detect do |item|
        { 'id' => item['id'], 'ref' => item['ref'], 'name' => item['name'] } == object_identifier
      end
    end
    
    # output array initializing
    array1_only = []
    array2_only = []
    array1_and_array2 = []
    
    # loop through all items in both array1 and array2 and check if it was in array1 or array2
    # if it was in both, add to array1_and_array2, otherwise, add it to the output array that
    # corresponds to the input array
    array3.each do |item|
      in_array1 = is_included_in(array1, item)
      in_array2 = is_included_in(array2, item)
    
      if in_array1 && in_array2
        array1_and_array2.push item
      elsif in_array1
        array1_only.push item
      else
        array2_only.push item
      end
    end
    
    
    puts array1_only.inspect        # => [{"id"=>"2", "ref"=>"1002", "name"=>"NY", "extra"=>"Not Sorted On 7"}, {"id"=>"7", "ref"=>"1007", "name"=>"OR", "extra"=>"Not Sorted On 11"}]
    puts array2_only.inspect        # => [{"id"=>"8", "ref"=>"1002", "name"=>"NY", "extra"=>"Not Sorted On 7"}, {"id"=>"5", "ref"=>"1005", "name"=>"MT", "extra"=>"Not Sorted On 10"}, {"id"=>"12", "ref"=>"1012", "name"=>"TX", "extra"=>"Not Sorted On 85"}]
    puts array1_and_array2.inspect  # => [{"id"=>"1", "ref"=>"1001", "name"=>"CA", "extra"=>"Not Sorted On 5"}, {"id"=>"3", "ref"=>"1003", "name"=>"WA", "extra"=>"Not Sorted On 9"}]