Search code examples
jsoncsvjqexport-to-csv

Converting nested JSON to CSV - specifically an object containing necessary values for each record and an array of values


I want to parse out a set of CSV records from JSON.

The JSON looks like this:

{
   "orderbookSnapshotItem":{
      "exchange":"WhirlyGig",
      "contract":"OptimalSpring",
      "bidPrice":[
         {
            "price":"47.205",
            "counterparty":"WhirlyGig",
            "quantity":"10",
            "allOrNothing":"false",
            "timestamp":"2023-03-02T11:52:36.554Z",
            "orderId":"0",
            "implied":"false"
         },
         {
            "price":"47.200",
            "counterparty":"WhirlyGig",
            "quantity":"5",
            "allOrNothing":"false",
            "timestamp":"2023-03-02T11:52:36.554Z",
            "orderId":"0",
            "implied":"false"
         }
      ],
      "askPrice":[
         {
            "price":"47.295",
            "counterparty":"WhirlyGig",
            "quantity":"10",
            "allOrNothing":"false",
            "timestamp":"2023-03-02T11:52:36.554Z",
            "orderId":"0",
            "implied":"false"
         },
         {
            "price":"47.300",
            "counterparty":"WhirlyGig",
            "quantity":"25",
            "allOrNothing":"false",
            "timestamp":"2023-03-02T11:52:36.554Z",
            "orderId":"0",
            "implied":"false"
         }
      ],
      "qtyDecimals":"0",
      "contractTitle":"Ancient Greek LPG - Apr23",
      "instrumentName":"GDP"
   }
}

No header line is required, I intend to specify all fields required with jq.

I want to unbundle each orderbookSnapshotItem into a set of records which each look like:

  "ICE",
  "TFM FMJ0023!",
  "0",
  "Dutch TTF Natural Gas Futures - TTF - Apr23",
  "TFM",
  "47.295",
  "ICE",
  "10",
  "false",
  "2023-03-02T11:52:36.554Z",
  "0",
  "false"
  "47.295",
  "ICE",
  "10",
  "false",
  "2023-03-02T11:52:36.554Z",
  "0",
  "false"

Or more exactly, as a CSV with no header, but if there was one it would be like this:

exchange,contract,qtyDecimals,contractTitle,instrumentName,bidprice,bidcounterparty,bidquantity,bidallornothing,bidtimestamp,bidorderid,bidimplied,askprice,askcounterparty,askquantity,askallornothing,asktimestamp,askorderid,askimplied

I have got quite a long way however it's just not right and I am struggling. I would be so very grateful if someone would point me in the right direction. It's approaching 01:00am and I can see I am not getting single lines containg both bid & ask but extra lines.

Below is my lamentable jq script.

.orderbookSnapshotItem as $parent | $parent | [.exchange as $ex | .contract as $con | .qtyDecimals as $decimals | .contractTitle as $title | .instrumentName as $name | $parent | .bidPrice[] | $ex, $con, $decimals, $title, $name, .price,.counterparty,.quantity,.allOrNothing,.timestamp,.orderId,.implied | $parent | .askPrice[] | $ex, $con, $decimals, $title, $name, .price,.counterparty,.quantity,.allOrNothing,.timestamp,.orderId,.implied]

Thank you for looking.

Edit: Just to finish off, with enormous thanks to @peak for educating me and pushing me in the right direction with an elegant solution.

The person for whom I was producing this data clarified his requirement upon seeing @peak's solution.

He just wanted a single line with the max bid & min ask. In response, I produced the following jq abomination which works, but doesn't handle missing bids/asks at all:

| $parent | .bidPrice | max_by(.price) as $bidme | $parent | .askPrice | min_by(.price) as $askme
| $parent | ([.exchange, .contract, .qtyDecimals, .contractTitle, .instrumentName,
$bidme.price,$bidme.counterparty,$bidme.quantity,$bidme.allOrNothing,$bidme.timestamp,$bidme.orderId,$bidme.implied,
$askme.price,$askme.counterparty,$askme.quantity,$askme.allOrNothing,$askme.timestamp,$askme.orderId,$askme.implied])? | @csv

Yes - embarrassing, isn't it?

So I looked again at @peak's work and came up with this, a hacked plagiarisation of @peak's solution:

# input: an object, the source of the values
# $obj: an object, the source of the relevant keys
# output: an array
def rec($obj):
  . as $in | [$obj|keys_unsorted[] as $k | $in[$k]];

({} | {price,counterparty,quantity,allOrNothing,timestamp,orderId,implied}) as $fields
| .orderbookSnapshotItem
| [.exchange,.contract,.qtyDecimals, .contractTitle, .instrumentName]  as $common
| ($common + (.bidPrice // [{}] | max_by(.price) | rec($fields)) + (.askPrice // [{}] | min_by(.price) | rec($fields)))?
| @csv

Against the following test.json,

{
   "?xml":{
      "@version":"1.0",
     "@encoding":"UTF-8",
      "@standalone":"yes"
   },
   "orderbookSnapshotItem":{
      "exchange":"WhirlyGig",
      "contract":"OptimalSpring",
      "bidPrice":[
         {
            "price":"47.205",
            "counterparty":"BiddyGig0",
            "quantity":"10",
            "allOrNothing":"false",
            "timestamp":"2023-03-02T11:52:36.554Z",
            "orderId":"0",
            "implied":"false"
         },
         {
            "price":"47.200",
            "counterparty":"BiddyGig1",
            "quantity":"5",
            "allOrNothing":"false",
            "timestamp":"2023-03-02T11:52:36.554Z",
            "orderId":"0",
            "implied":"false"
         }
      ],
      "qtyDecimals":"0",
      "contractTitle":"Ancient Greek LPG - Apr23",
      "instrumentName":"GDP"
   }
}

with the command line (under windows) where query.jq contains the jq script above:

type test.json | .\jq-win64.exe -r -f query.jq

produces the following output:

"WhirlyGig","OptimalSpring","0","Ancient Greek LPG - Apr23","GDP","47.205","BiddyGig0","10","false","2023-03-02T11:52:36.554Z","0","false",,,,,,,

Which is exactly what I wanted.

So it's a big thank you and thumbs up for @peak.


Solution

  • If this isn't exactly what you're after, it will hopefully set you on the right path:

    # input: an object, the source of the values
    # $obj: an object, the source of the relevant keys
    # output: an array
    def array($obj):
      . as $in | [$obj|keys_unsorted[] as $k | $in[$k]];
      
    ({} | {price,counterparty,quantity,allOrNothing,timestamp,orderId,implied}) as $fields
    | .orderbookSnapshotItem 
    | [.exchange,.contract,.qtyDecimals, .contractTitle, .instrumentName]  as $common
    | range(0; .bidPrice|length) as $i
    | ($common + (.bidPrice[$i] | array($fields)) + (.askPrice[$i] | array($fields)))
    | @csv