Search code examples
jsonjqdata-partitioning

Remove matching/non-matching elements of a nested array using jq


I need to split the results of a sonarqube analysis history into individual files. Assuming a starting input below,

    {
  "paging": {
    "pageIndex": 1,
    "pageSize": 100,
    "total": 3
  },
  "measures": [
    {
      "metric": "coverage",
      "history": [
        {
          "date": "2018-11-18T12:37:08+0000",
          "value": "100.0"
        },
        {
          "date": "2018-11-21T12:22:39+0000",
          "value": "100.0"
        },
        {
          "date": "2018-11-21T13:09:02+0000",
          "value": "100.0"
        }
      ]
    },
    {
      "metric": "bugs",
      "history": [
        {
          "date": "2018-11-18T12:37:08+0000",
          "value": "0"
        },
        {
          "date": "2018-11-21T12:22:39+0000",
          "value": "0"
        },
        {
          "date": "2018-11-21T13:09:02+0000",
          "value": "0"
        }
      ]
    },
    {
      "metric": "vulnerabilities",
      "history": [
        {
          "date": "2018-11-18T12:37:08+0000",
          "value": "0"
        },
        {
          "date": "2018-11-21T12:22:39+0000",
          "value": "0"
        },
        {
          "date": "2018-11-21T13:09:02+0000",
          "value": "0"
        }
      ]
    }
  ]
}

How do I use jq to clean the results so it only retains the history array entries for each element? The desired output is something like this (output-20181118123808.json for analysis done on "2018-11-18T12:37:08+0000"):

{
  "paging": {
    "pageIndex": 1,
    "pageSize": 100,
    "total": 3
  },
  "measures": [
    {
      "metric": "coverage",
      "history": [
        {
          "date": "2018-11-18T12:37:08+0000",
          "value": "100.0"
        }
      ]
    },
    {
      "metric": "bugs",
      "history": [
        {
          "date": "2018-11-18T12:37:08+0000",
          "value": "0"
        }
      ]
    },
    {
      "metric": "vulnerabilities",
      "history": [
        {
          "date": "2018-11-18T12:37:08+0000",
          "value": "0"
        }
      ]
    }
  ]
}

I am lost on how to operate only on the sub-elements while leaving the parent structure intact. The naming of the JSON file is going to be handled externally from the jq utility. The sample data provided will be split into 3 files. Some other input can have a variable number of entries, some may be up to 10000. Thanks.


Solution

  • Here is a solution which uses awk to write the distinct files. The solution assumes that the dates for each measure are the same and in the same order, but imposes no limit on the number of distinct dates, or the number of distinct measures.

    jq -c 'range(0; .measures[0].history|length) as $i
      | (.measures[0].history[$i].date|gsub("[^0-9]";"")),  # basis of filename
        reduce range(0; .measures|length) as $j (.;
          .measures[$j].history |= [.[$i]])' input.json |
    awk -F\\t 'fn {print >> fn; fn="";next}{fn="output-" $1 ".json"}'
    

    Comments

    The choice of awk here is just for convenience.

    The disadvantage of this approach is that if each file is to be neatly formatted, an additional run of a pretty-printer (such as jq) would be required for each file. Thus, if the output in each file is required to be neat, a case could be made for running jq once for each date, thus obviating the need for the post-processing (awk) step.

    If the dates of the measures are not in lock-step, then the same approach as above could still be used, but of course the gathering of the dates and the corresponding measures would have to be done differently.

    Output

    The first two lines produced by the invocation of jq above are as follows:

    "201811181237080000"
    {"paging":{"pageIndex":1,"pageSize":100,"total":3},"measures":[{"metric":"coverage","history":[{"date":"2018-11-18T12:37:08+0000","value":"100.0"}]},{"metric":"bugs","history":[{"date":"2018-11-18T12:37:08+0000","value":"0"}]},{"metric":"vulnerabilities","history":[{"date":"2018-11-18T12:37:08+0000","value":"0"}]}]}