I have a large array of objects stored in a master JSON file. I want to loop through that array, take each object, and append it to a new file based on a field in the object (in this case, the state name). In other words, in a set of data containing many states, I want to filter it out to a file for each state.
I'm using an existing JQ expression to filter for only the data I actually need:
{ fipscode: .fipscode, level: .level, polid: .polid, polnum: .polnum, precinctsreporting: .precinctsreporting, precinctsreportingpct: .precinctsreportingpct, precinctstotal: .precinctstotal, raceid: .raceid, runoff: .runoff, statepostal: .statepostal, votecount: .votecount, votepct: .votepct, winner: .winner }
Here's a sample of my input:
[
{ "ballotorder": 2, "candidateid": "9718", "delegatecount": 0, "description": null, "electiondate": "2018-08-28", "electtotal": 0, "electwon": 0, "fipscode": null, "first": "Doug", "id": "3015-polid-64364-state-AZ-1", "incumbent": true, "initialization_data": false, "is_ballot_measure": false, "last": "Ducey", "lastupdated": "2018-08-30T00:01:38.897Z", "level": "state", "national": true, "officeid": "G", "officename": "Governor", "party": "GOP", "polid": "64364", "polnum": "5554", "precinctsreporting": 1488, "precinctsreportingpct": 0.9993000000000001, "precinctstotal": 1489, "raceid": "3015", "racetype": "Primary", "racetypeid": "R", "reportingunitid": "state-AZ-1", "reportingunitname": null, "runoff": false, "seatname": null, "seatnum": null, "statename": "Arizona", "statepostal": "AZ", "test": false, "uncontested": false, "votecount": 355455, "votepct": 0.705493, "winner": true },
{ "ballotorder": 2, "candidateid": "21689", "delegatecount": 0, "description": null, "electiondate": "2018-08-28", "electtotal": 0, "electwon": 0, "fipscode": null, "first": "Ron", "id": "10046-polid-62557-state-FL-1", "incumbent": false, "initialization_data": false, "is_ballot_measure": false, "last": "DeSantis", "lastupdated": "2018-08-29T19:29:50.367Z", "level": "state", "national": true, "officeid": "G", "officename": "Governor", "party": "GOP", "polid": "62557", "polnum": "13918", "precinctsreporting": 5968, "precinctsreportingpct": 1.0, "precinctstotal": 5968, "raceid": "10046", "racetype": "Primary", "racetypeid": "R", "reportingunitid": "state-FL-1", "reportingunitname": null, "runoff": false, "seatname": null, "seatnum": null, "statename": "Florida", "statepostal": "FL", "test": false, "uncontested": false, "votecount": 913997, "votepct": 0.564728, "winner": true },
{ "ballotorder": 2, "candidateid": "45555", "delegatecount": 0, "description": null, "electiondate": "2018-08-28", "electtotal": 0, "electwon": 0, "fipscode": null, "first": "Rex", "id": "38538-polid-67011-state-OK-1", "incumbent": false, "initialization_data": false, "is_ballot_measure": false, "last": "Lawhorn", "lastupdated": "2018-08-29T02:44:44.610Z", "level": "state", "national": true, "officeid": "G", "officename": "Governor", "party": "Lib", "polid": "67011", "polnum": "40784", "precinctsreporting": 1951, "precinctsreportingpct": 1.0, "precinctstotal": 1951, "raceid": "38538", "racetype": "Runoff", "racetypeid": "L", "reportingunitid": "state-OK-1", "reportingunitname": null, "runoff": false, "seatname": null, "seatnum": null, "statename": "Oklahoma", "statepostal": "OK", "test": false, "uncontested": false, "votecount": 379, "votepct": 0.409287, "winner": false }
]
As output, I would expect to have a Arizona.json
containing only the item(s) from that state, and also filtered to remove unwanted fields:
[
{ "fipscode": null, "level": "state", "polid": "64364", "polnum": "5554", "precinctsreporting": 1488, "precinctsreportingpct": 0.9993000000000001, "precinctstotal": 1489, "raceid": "3015", "runoff": false, "statepostal": "AZ", "votecount": 355455, "votepct": 0.705493, "winner": true }
]
...and likewise for the other states involved (Florida.json
and Oklahoma.json
).
Here's the bash and jq script I have so far:
cat master.json |
jq -cn --stream 'fromstream(1|truncate_stream(inputs))' |
jq -c '.statename as $state | {
fipscode: .fipscode,
level: .level,
polid: .polid,
polnum: .polnum,
precinctsreporting: .precinctsreporting,
precinctsreportingpct: .precinctsreportingpct,
precinctstotal: .precinctstotal,
raceid: .raceid,
runoff: .runoff,
statepostal: .statepostal,
votecount: .votecount,
votepct: .votepct,
winner: .winner
}'
What I can't figure out is how to intercept each row so I can determine where the output should go. Is this possible?
You can do this with one copy of jq
splitting out data items from the input file, and then another instance per state collating those data items together, with bash providing the glue. See the following example, for bash 4.2 or newer (might work with 4.1, I'd need to check).
#!/usr/bin/env bash
case $BASH_VERSION in ''|[123].*|4.[01].*) echo "ERROR: Bash 4.2 required" >&2; exit 1;; esac
input_file=$1
[[ -s $input_file ]] || { echo "Usage: ${0##*/} input-file" >&2; exit 1; }
jq_split_script='
# modify this function to fit your needs
def relevantContentOnly:
{ fipscode, level, polid, polnum, precinctsreporting, precinctsreportingpct, precinctstotal, raceid, runoff, statepostal, votecount, votepct, winner };
.[] | [.statename, (relevantContentOnly | tojson)] | @tsv
'
# Use an associative array to map from state names to output FDs
declare -A out_fds=( )
# Read state / line-of-data pairs from our JQ script...
while IFS=$'\t' read -r state data; do
# If we don't already have a writer for the current state, start one.
if [[ ! ${out_fds[$state]} ]]; then
exec {new_fd}> >(jq -n '[inputs]' >"$state.json")
out_fds[$state]=$new_fd
fi
# Regardless, send the data to the FD we have for this state
printf '%s\n' "$data" >&${out_fds[$state]}
done < <(jq -rc "$jq_split_script" <"$input_file") # ...running the JQ script above.
# close output FDs, so the JQ instances all flush
for fd in "${!out_fds[@]}"; do
exec {fd}>&-
done