Search code examples
amazon-redshiftjqinspec

How to get a simple JSON data file from a complex JSON file using jq


I have a complex json file, nested to 4th and 5th levels and I am trying to get the below result using jq. Any help would be appreciated:

{
  "Name": "unix-global",
  "Title": "AWS cli should be installed",
  "desc": "System Package aws-cli should be installed",
  "result": "passed"
} 
{
  "Name": "unix-global",
  "Title": "AWS cli should be installed",
  "desc": "Service besclient should be installed",
  "result": "failed"
}

This is a json file that i get as a result of running an inspec profile. The real aim is to extract the only needed info into a simple json so that I can finally update a AWS Redshift database.

{
  "version": "1.7.1",
  "profiles": [{
    "name": "java",
    "title": "InSpec Java in system",
    "maintainer": "awim",
    "copyright": "awim / mtaqwim",
    "copyright_email": "[email protected]",
    "license": "All Rights Reserved",
    "summary": "An InSpec Compliance Profile",
    "version": "0.0.1",
    "supports": [],
    "controls": [{
      "title": "identify java in system",
      "desc": "identify java in PATH system",
      "impact": 0.3,
      "refs": [],
      "tags": {},
      "code": "control 'java-1.0' do\n  impact 0.3\n  title 'identify java in system'\n  desc 'identify java in PATH system'\n\n  describe java_info do\n    it{ should exist }\n    its(:version){ should match '1.7'}\n  end\nend",
      "source_location": {
        "ref": "inspec/java/controls/java_1.0.rb",
        "line": 6
      },
      "id": "java-1.0",
      "results": [{
        "status": "passed",
        "code_desc": "java_info should exist",
        "run_time": 0.000895896,
        "start_time": "2017-01-20 05:04:47 +0000"
      }, {
        "status": "passed",
        "code_desc": "java_info version should match \"1.7\"",
        "run_time": 0.067581113,
        "start_time": "2017-01-20 05:04:47 +0000"
      }]
    }, {
      "title": "run java from specific path",
      "desc": "run java from specific path",
      "impact": 1.0,
      "refs": [],
      "tags": {},
      "code": "control 'java-2.0' do\n  impact 1.0\n  title 'run java from specific path'\n  desc 'run java from specific path'\n\n  describe java_info(java_path) do\n    it{ should exist }\n    its(:version){ should match '1.7'}\n  end\nend",
      "source_location": {
        "ref": "inspec/java/controls/java_2.0.rb",
        "line": 8
      },
      "id": "java-2.0",
      "results": [{
        "status": "skipped",
        "code_desc": "java_info",
        "skip_message": "Can't find file \"/opt/jdk/current\"",
        "resource": "java_info",
        "run_time": 1.6512e-05,
        "start_time": "2017-01-20 05:04:47 +0000"
      }]
    }, {
      "title": "identify java home",
      "desc": "identify java home match to specific path",
      "impact": 0.1,
      "refs": [],
      "tags": {},
      "code": "control 'java-3.0' do\n  impact 0.1\n  title 'identify java home'\n  desc 'identify java home match to specific path'\n\n  describe java_info(java_path) do\n    its(:java_home){ should match java_path}\n  end\nend",
      "source_location": {
        "ref": "inspec/java/controls/java_3.0.rb",
        "line": 8
      },
      "id": "java-3.0",
      "results": [{
        "status": "skipped",
        "code_desc": "java_info",
        "skip_message": "Can't find file \"/opt/jdk/current\"",
        "resource": "java_info",
        "run_time": 6.139e-06,
        "start_time": "2017-01-20 05:04:47 +0000"
      }]
    }],
    "groups": [{
      "title": "which(UNIX)/where(Windows) java installed",
      "controls": ["java-1.0"],
      "id": "controls/java_1.0.rb"
    }, {
      "title": "which(UNIX)/where(Windows) java installed",
      "controls": ["java-2.0"],
      "id": "controls/java_2.0.rb"
    }, {
      "title": "which(UNIX)/where(Windows) java installed",
      "controls": ["java-3.0"],
      "id": "controls/java_3.0.rb"
    }],
    "attributes": []
  }],
  "other_checks": [],
  "statistics": {
    "duration": 0.069669698
  }
}

Solution

  • Here's a jq filter to flatten this out. Note that the "piping" between filters is essential. You must flatten each parent array before you flatten it's child or you get a cartesian product of them all (which is very bad).

    jq '.profiles[] 
         | { Name: .name , Controls: .controls[] } 
         | { Name: .Name, Desc: .Controls.desc , Title: .Controls.title , Results: .Controls.results[] } 
         | { Name: .Name, Desc: .Desc , Title: .Title , StartTime: .Results.start_time , RunTime: .Results.run_time , Result: .Results.status }'
    

    Line breaks added to the code for clarity

    Output:

    {
      "Name": "java",
      "Desc": "identify java in PATH system",
      "Title": "identify java in system",
      "StartTime": "2017-01-20 05:04:47 +0000",
      "RunTime": 0.000895896,
      "Result": "passed"
    }
    …etc
    

    Once you've flattened it this far I would consider saving it as CSV instead as that will be somewhat simpler to load into Redshift.

     jq '.profiles[] 
         | { Name: .name , Controls: .controls[] } 
         | { Name: .Name, Desc: .Controls.desc , Title: .Controls.title , Results: .Controls.results[] } 
         | [ .Name, .Desc , .Title , .Results.start_time , .Results.run_time , .Results.status ] 
         | @csv '
    

    Output:

    "\"java\",\"identify java in PATH system\",\"identify java in system\",\"2017-01-20 05:04:47 +0000\",0.000895896,\"passed\""
    …etc