Is there a way to compare two json files in jq? Specifically, I'd like to be able to remove objects from one json file if they occur in another json file. Basically, subtract one file from another. It would be a bonus if I could generalize this so that I could define the equality criteria for the objects, but this is not strictly necessary, it can be based strictly on the objects being identical.
So the more general case would look like this. Let's say I have a file that looks like this:
[
{
"name": "Cynthia",
"surname": "Craig",
"isActive": true,
"balance": "$2,426.88"
},
{
"name": "Elise",
"surname": "Long",
"isActive": false,
"balance": "$1,892.72"
},
{
"name": "Hyde",
"surname": "Adkins",
"isActive": true,
"balance": "$1,769.34"
},
{
"name": "Matthews",
"surname": "Jefferson",
"isActive": true,
"balance": "$1,991.42"
},
{
"name": "Kris",
"surname": "Norris",
"isActive": false,
"balance": "$2,137.11"
}
]
And I have a second file that looks like this:
[
{
"name": "Cynthia",
"surname": "Craig"
},
{
"name": "Kris",
"surname": "Norris"
}
]
I'd like to remove any objects from the first file where the name and surname fields match an object of the second file, so that the results should look like this:
[
{
"name": "Elise",
"surname": "Long",
"isActive": false,
"balance": "$1,892.72"
},
{
"name": "Hyde",
"surname": "Adkins",
"isActive": true,
"balance": "$1,769.34"
},
{
"name": "Matthews",
"surname": "Jefferson",
"isActive": true,
"balance": "$1,991.42"
}
]
The following solution is intended to be generic, efficient and as simple as possible subject to the first two objectives.
For genericity, let us suppose that $one and $two are two arrays of
JSON entities, and that we wish to find those items, $x, in $one
such that ($x|filter) does not appear in map($two | filter), where filter
is an arbitrary filter. (In the present instance, it is {surname, name}
.)
The solution uses INDEX/1
, which was added to jq after the official 1.5 release, so we begin by reproducing its definition:
def INDEX(stream; idx_expr):
reduce stream as $row ({};
.[$row|idx_expr|
if type != "string" then tojson
else .
end] |= $row);
def INDEX(idx_expr): INDEX(.[]; idx_expr);
For efficiency, we will need to use a JSON object as a dictionary;
since keys must be strings, we will need to ensure that when converting an object
to a string, the objects are normalized. For this, we define normalize
as follows:
# Normalize the input with respect to the order of keys in objects
def normalize:
. as $in
| if type == "object" then reduce keys[] as $key
( {}; . + { ($key): ($in[$key] | normalize) } )
elif type == "array" then map( normalize )
else .
end;
To construct the dictionary, we simply apply (normalize|tojson):
def todict(filter):
INDEX(filter| normalize | tojson);
The solution is now quite simple:
# select those items from the input stream for which
# (normalize|tojson) is NOT in dict:
def MINUS(filter; $dict):
select( $dict[filter | normalize | tojson] | not);
def difference($one; $two; filter):
($two | todict(filter)) as $dict
| $one[] | MINUS( filter; $dict );
difference( $one; $two; {surname, name} )
$ jq -n --argfile one one.json --argfile two two.json -f difference.jq