Search code examples
arraysjsonjqset-intersectionset-difference

Find common and unique items between two arrays


I use ec2.py dynamic inventory script with ansible to extract a list of ec2 hosts and their tag names. It returns me a list of JSON as below,

  "tag_aws_autoscaling_groupName_asg_test": [
    "aa.b.bb.55",
    "1b.b.c.d"
  ],

  "tag_aws_autoscaling_groupName_asg_unknown": [
    "aa.b.bb.55",
    "1b.b.c.e"
  ],

I'm using jq for parsing this output.

  1. How can I extract only fields common to both these ASG?
  2. How can I extract only fields unique to both these ASG?

Solution

  • difference/2

    Because of the way jq's "-" operator is defined on arrays, one invocation of unique is sufficient to produce a "uniquified" answer:

    def difference($a; $b): ($a | unique) - $b;
    

    Similarly, for the symmetric difference, a single sorting operation is sufficient to produce a "uniquified" value:

    def sdiff($a; $b): (($a-$b) + ($b-$a)) | unique;
    

    intersect/2

    Here is a faster version of intersect/2 that should work with all versions of jq -- it eliminates group_by in favor of sort:

    def intersect(x;y):
      ( (x|unique) + (y|unique) | sort) as $sorted
      | reduce range(1; $sorted|length) as $i
          ([];
           if $sorted[$i] == $sorted[$i-1] then . + [$sorted[$i]] else . end) ;
    

    intersection/2

    If you have jq 1.5, then here's a similar but still measurably faster set-intersection function: it produces a stream of the elements in the set-intersection of the two arrays:

    def intersection(x;y):
      (x|unique) as $x | (y|unique) as $y
      | ($x|length) as $m
      | ($y|length) as $n
      | if $m == 0 or $n == 0 then empty
        else { i:-1, j:-1, ans:false }
        | while(  .i < $m and .j < $n;
            $x[.i+1] as $nextx
            | if $nextx == $y[.j+1] then {i:(.i+1), j:(.j+1), ans: true, value: $nextx}
              elif  $nextx < $y[.j+1] then .i += 1 | .ans = false
              else  .j += 1 | .ans = false
              end )
        end
      | if .ans then .value else empty end ;