Search code examples
jsonexport-to-csvjq

Using JQ to specific csv format


I have a json that looks like this:

[
  {
    "auth": 1,
    "status": "Active",
    "userCustomAttributes": [
      {
        "customAttributeName": "Attribute 1",
        "customAttributeValue": "Value 1"
      },
      {
        "customAttributeName": "Attribute 2",
        "customAttributeValue": "Value 2"
      },
      {
        "customAttributeName": "Attribute 3",
        "customAttributeValue": "Value 3"
      }
    ],
  },
  {
    "auth": 1,
    "status": "Active",
    "userCustomAttributes": [
      {
        "customAttributeName": "Attribute 1",
        "customAttributeValue": "Value 1"
      },
      {
        "customAttributeName": "Attribute 2",
        "customAttributeValue": "Value 2"
      },
      {
        "customAttributeName": "Attribute 3",
        "customAttributeValue": "Value 3"
      },
      {
        "customAttributeName": "Attribute 4",
        "customAttributeValue": "Value 4"
      }
    ],
  }
]

I would like to parse this and have a css output that looks something like this:

authType, status, attribute 1, attribute 2, attribute 3, attribute 4
"1", "active", "value1", "value2", "value3",""
"1", "active", "value1", "value2", "value3","value 4"

The json has over 180k records in the array so it would need to loop through all of them. Some records don't have all the attributes. Some have all 4 yet some only have 1. I am hoping to show a null value in the csv for the records that don't have the attribute.


Solution

  • With your sample input, the following program, which does not depend on the ordering of the "attribute" keys:

    jq -r '
    ["Attribute 1", "Attribute 2", "Attribute 3", "Attribute 4"] as $attributes
    # Header row
    | ["authType", "status"] 
      + ($attributes | map( (.[:1] | ascii_upcase) + .[1:])),
    # Data rows:
      (.[]
       | (INDEX(.userCustomAttributes[]; .customAttributeName)
          | map_values(.customAttributeValue)) as $dict
       | [.auth, .status] + [ $dict[ $attributes[] ] ]
       )
    | @csv
    '
    

    produces the following CSV:

    "authType","status","Attribute 1","Attribute 2","Attribute 3","Attribute 4"
    1,"Active","Value 1","Value 2","Value 3",
    1,"Active","Value 1","Value 2","Value 3","Value 4"
    
    

    You can easily modify this to emit a literal string of your choice in place of a JSON null value.

    Explanation

    $dict[ $a[] ] produces the stream of values:

    $dict[ $a[0] ]
    $dict[ $a[1] ]
    ...
    

    This is used to ensure the columns are produced in the correct order, independently of the ordering or even presence of the keys.