Search code examples
jsonexport-to-csvjq

JSON file to CSV file conversion using jq


I am trying to convert my json file to a csv file using jq. Below is the sample input events.json file.

{
  "took" : 111,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "alerts",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 1.0,
        "_source" : {
          "alertID" : "639387c3-0fbe-4c2b-9387-c30fbe7c2bc6",
          "alertCategory" : "Server Alert",
          "description" : "Successfully started.",
          "logId" : null
          }
       },
       {
        "_index" : "alerts",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 1.0,
        "_source" : {
          "alertID" : "2",
          "alertCategory" : "Server Alert",
          "description" : "Successfully stoped.",
          "logId" : null
          }
       }
   ]
  }
}

My rows in csv should have the data inside each _source tag. So my columns would be alertId , alertCategory , description and logId with its respective data.

I tried the below command : jq --raw-output '.hits[] | [."alertId",."alertCategory",."description",."logId"] | @csv' < /root/events.json and its not working.

Can anyone help me with this?


Solution

  • Your path-expression is not right, you have a hits array inside an object named hits and the fields you trying to put in CSV is present under __source object.

    So your expression should have been below. Use it along with -r flag to put the output in raw output format

    .hits.hits[]._source | [ .alertID, .alertCategory, .description, .logId ] | @csv
    

    If your fields are null, the string representation of your null field value results in just "". If you want an explicit "null" string representation, use the alternate operator along with the field you expect to be null, e.g. instead of .logId, you can do (.logId // "null")

    To add the column name as the header in the output CSV format, you could use the @csv or the join(",") function in raw output format -r

    [ "alertId" , "alertCategory" , "description", "logId" ], 
    ( .hits.hits[]._source |  [ .alertID, .alertCategory, .description, .logId // "null" ]) | @csv
    

    or

    [ "alertId" , "alertCategory" , "description", "logId" ], 
    ( .hits.hits[]._source |  [ .alertID, .alertCategory, .description, .logId // "null" ]) | join(",")