I have the a single JSON object as below,
{
"someOtherArray": [ {} , {} ],
"a": [
{
"item1": "item1_value",
"item2": "item2_value"
},
{
"item1": "item1_value",
"item2": "item2_value"
},
{
....
},
100 million more object
]
}
I'm trying to make each element in the array as a separate JSON object as below,
{ "a": { "item1": "item1_value", "item2": "item2_value" } }
{ "a": { "item1": "item1_value", "item2": "item2_value" } }
The raw files has millions of nested objects in a single JSON array, which I want to split into multiple individual JSON.
This is a response to the revised question (i.e., "I just want 'a'").
You could just tweak the standard answer:
jq --stream -nc '
{"a": fromstream(2|truncate_stream(inputs | select(.[0][0]=="a")) )}
'
The jq streaming parser is economical with memory at the expense of execution speed. If the input consists of an array of N small objects, then the execution time should very roughly be linear in N, and the memory requirements should be roughly constant.
To give some idea of what to expect, I created an array of 10^8 objects similar to those described in the Q. The file size was 4GB. On a 3GHz machine, reading the file took about 16 minutes of u+s time, but the "peak memory footprint" was only 1.2MB.
gojq was slightly slower but required significantly more memory, the "peak memory footprint" being 8.4MB, and I suspect that the required memory grows with N.