Search code examples
jsonpivot-tablejqexport-to-csv

Translate non-uniqified json output into a summary table using jq


I am trying to transform job data from the LSF job scheduler into a summary table of hosts and statuses. For example this might be some sample data:

$ bjobs -q normal -a -uall -o 'exec_host stat' -json
{
  "COMMAND":"bjobs",
  "JOBS":5,
  "RECORDS":[
    {
      "EXEC_HOST":"compute-node-1",
      "STAT":"RUN"
    },
    {
      "EXEC_HOST":"compute-node-1",
      "STAT":"DONE"
    },
    {
      "EXEC_HOST":"compute-node-2",
      "STAT":"RUN"
    },
    {
      "EXEC_HOST":"compute-node-1",
      "STAT":"EXIT"
    },
    {
      "EXEC_HOST":"compute-node-2",
      "STAT":"RUN"
    },

And I want output that looks like:

                 RUN   DONE   EXIT
compute-node-1   1     1      1
compute-node-2   2

I can accomplish this through some really awkward contortions using datamash, but minimizing the workflow to bjobs and jq would significantly improve maintainability. I'm struggling to come up with the recipe to summarize the unique EXEC_HOST/STAT values.

Is there a way within jq to summarize this data as above?


Solution

  • With your input, the following invocation produces TSV output, as shown:

    jq -r '
      .RECORDS 
      | (map(.STAT) | unique) as $statuses
      | reduce .[] as $x (null; .[$x.EXEC_HOST][$x.STAT] += 1)
      | [null, $statuses[]],
         (to_entries[] | [.key, .value[$statuses[]]])
      | @tsv
    
        DONE    EXIT    RUN
    compute-node-1  1   1   1
    compute-node-2          2
    

    You can easily tweak the above, e.g. if you want more control over the ordering of the columns. Also, you might want to use @csv instead of @tsv, etc....