I want to parse out a set of CSV records from JSON.
The JSON looks like this:
{
"orderbookSnapshotItem":{
"exchange":"WhirlyGig",
"contract":"OptimalSpring",
"bidPrice":[
{
"price":"47.205",
"counterparty":"WhirlyGig",
"quantity":"10",
"allOrNothing":"false",
"timestamp":"2023-03-02T11:52:36.554Z",
"orderId":"0",
"implied":"false"
},
{
"price":"47.200",
"counterparty":"WhirlyGig",
"quantity":"5",
"allOrNothing":"false",
"timestamp":"2023-03-02T11:52:36.554Z",
"orderId":"0",
"implied":"false"
}
],
"askPrice":[
{
"price":"47.295",
"counterparty":"WhirlyGig",
"quantity":"10",
"allOrNothing":"false",
"timestamp":"2023-03-02T11:52:36.554Z",
"orderId":"0",
"implied":"false"
},
{
"price":"47.300",
"counterparty":"WhirlyGig",
"quantity":"25",
"allOrNothing":"false",
"timestamp":"2023-03-02T11:52:36.554Z",
"orderId":"0",
"implied":"false"
}
],
"qtyDecimals":"0",
"contractTitle":"Ancient Greek LPG - Apr23",
"instrumentName":"GDP"
}
}
No header line is required, I intend to specify all fields required with jq.
I want to unbundle each orderbookSnapshotItem into a set of records which each look like:
"ICE",
"TFM FMJ0023!",
"0",
"Dutch TTF Natural Gas Futures - TTF - Apr23",
"TFM",
"47.295",
"ICE",
"10",
"false",
"2023-03-02T11:52:36.554Z",
"0",
"false"
"47.295",
"ICE",
"10",
"false",
"2023-03-02T11:52:36.554Z",
"0",
"false"
Or more exactly, as a CSV with no header, but if there was one it would be like this:
exchange,contract,qtyDecimals,contractTitle,instrumentName,bidprice,bidcounterparty,bidquantity,bidallornothing,bidtimestamp,bidorderid,bidimplied,askprice,askcounterparty,askquantity,askallornothing,asktimestamp,askorderid,askimplied
I have got quite a long way however it's just not right and I am struggling. I would be so very grateful if someone would point me in the right direction. It's approaching 01:00am and I can see I am not getting single lines containg both bid & ask but extra lines.
Below is my lamentable jq script.
.orderbookSnapshotItem as $parent | $parent | [.exchange as $ex | .contract as $con | .qtyDecimals as $decimals | .contractTitle as $title | .instrumentName as $name | $parent | .bidPrice[] | $ex, $con, $decimals, $title, $name, .price,.counterparty,.quantity,.allOrNothing,.timestamp,.orderId,.implied | $parent | .askPrice[] | $ex, $con, $decimals, $title, $name, .price,.counterparty,.quantity,.allOrNothing,.timestamp,.orderId,.implied]
Thank you for looking.
Edit: Just to finish off, with enormous thanks to @peak for educating me and pushing me in the right direction with an elegant solution.
The person for whom I was producing this data clarified his requirement upon seeing @peak's solution.
He just wanted a single line with the max bid & min ask. In response, I produced the following jq abomination which works, but doesn't handle missing bids/asks at all:
| $parent | .bidPrice | max_by(.price) as $bidme | $parent | .askPrice | min_by(.price) as $askme
| $parent | ([.exchange, .contract, .qtyDecimals, .contractTitle, .instrumentName,
$bidme.price,$bidme.counterparty,$bidme.quantity,$bidme.allOrNothing,$bidme.timestamp,$bidme.orderId,$bidme.implied,
$askme.price,$askme.counterparty,$askme.quantity,$askme.allOrNothing,$askme.timestamp,$askme.orderId,$askme.implied])? | @csv
Yes - embarrassing, isn't it?
So I looked again at @peak's work and came up with this, a hacked plagiarisation of @peak's solution:
# input: an object, the source of the values
# $obj: an object, the source of the relevant keys
# output: an array
def rec($obj):
. as $in | [$obj|keys_unsorted[] as $k | $in[$k]];
({} | {price,counterparty,quantity,allOrNothing,timestamp,orderId,implied}) as $fields
| .orderbookSnapshotItem
| [.exchange,.contract,.qtyDecimals, .contractTitle, .instrumentName] as $common
| ($common + (.bidPrice // [{}] | max_by(.price) | rec($fields)) + (.askPrice // [{}] | min_by(.price) | rec($fields)))?
| @csv
Against the following test.json,
{
"?xml":{
"@version":"1.0",
"@encoding":"UTF-8",
"@standalone":"yes"
},
"orderbookSnapshotItem":{
"exchange":"WhirlyGig",
"contract":"OptimalSpring",
"bidPrice":[
{
"price":"47.205",
"counterparty":"BiddyGig0",
"quantity":"10",
"allOrNothing":"false",
"timestamp":"2023-03-02T11:52:36.554Z",
"orderId":"0",
"implied":"false"
},
{
"price":"47.200",
"counterparty":"BiddyGig1",
"quantity":"5",
"allOrNothing":"false",
"timestamp":"2023-03-02T11:52:36.554Z",
"orderId":"0",
"implied":"false"
}
],
"qtyDecimals":"0",
"contractTitle":"Ancient Greek LPG - Apr23",
"instrumentName":"GDP"
}
}
with the command line (under windows) where query.jq contains the jq script above:
type test.json | .\jq-win64.exe -r -f query.jq
produces the following output:
"WhirlyGig","OptimalSpring","0","Ancient Greek LPG - Apr23","GDP","47.205","BiddyGig0","10","false","2023-03-02T11:52:36.554Z","0","false",,,,,,,
Which is exactly what I wanted.
So it's a big thank you and thumbs up for @peak.
If this isn't exactly what you're after, it will hopefully set you on the right path:
# input: an object, the source of the values
# $obj: an object, the source of the relevant keys
# output: an array
def array($obj):
. as $in | [$obj|keys_unsorted[] as $k | $in[$k]];
({} | {price,counterparty,quantity,allOrNothing,timestamp,orderId,implied}) as $fields
| .orderbookSnapshotItem
| [.exchange,.contract,.qtyDecimals, .contractTitle, .instrumentName] as $common
| range(0; .bidPrice|length) as $i
| ($common + (.bidPrice[$i] | array($fields)) + (.askPrice[$i] | array($fields)))
| @csv