I am trying to transform job data from the LSF
job scheduler into a summary table of hosts and statuses. For example this might be some sample data:
$ bjobs -q normal -a -uall -o 'exec_host stat' -json
{
"COMMAND":"bjobs",
"JOBS":5,
"RECORDS":[
{
"EXEC_HOST":"compute-node-1",
"STAT":"RUN"
},
{
"EXEC_HOST":"compute-node-1",
"STAT":"DONE"
},
{
"EXEC_HOST":"compute-node-2",
"STAT":"RUN"
},
{
"EXEC_HOST":"compute-node-1",
"STAT":"EXIT"
},
{
"EXEC_HOST":"compute-node-2",
"STAT":"RUN"
},
And I want output that looks like:
RUN DONE EXIT
compute-node-1 1 1 1
compute-node-2 2
I can accomplish this through some really awkward contortions using datamash
, but minimizing the workflow to bjobs
and jq
would significantly improve maintainability. I'm struggling to come up with the recipe to summarize the unique EXEC_HOST
/STAT
values.
Is there a way within jq
to summarize this data as above?
With your input, the following invocation produces TSV output, as shown:
jq -r '
.RECORDS
| (map(.STAT) | unique) as $statuses
| reduce .[] as $x (null; .[$x.EXEC_HOST][$x.STAT] += 1)
| [null, $statuses[]],
(to_entries[] | [.key, .value[$statuses[]]])
| @tsv
DONE EXIT RUN
compute-node-1 1 1 1
compute-node-2 2
You can easily tweak the above, e.g. if you want more control over the ordering of the columns. Also, you might want to use @csv
instead of @tsv
, etc....