How can I concat a list of JSON files into a huge JSON array? I've 5000 files and 550 000 list items.
My fist try was to use jq, but it looks like jq -s is not optimized for a large input.
jq -s -r '[.[][]]' *.js
This command works, but takes way too long to complete and I really would like to solve this with Python.
Here is my current code:
def concatFiles(outName, inFileNames):
def listGenerator():
for inName in inFileNames:
with open(inName, 'r') as f:
for item in json.load(f):
yield item
with open(outName, 'w') as f:
json.dump(listGenerator(), f)
I'm getting:
TypeError: <generator object listGenerator at 0x7f94dc2eb3c0> is not JSON serializable
Any attempt load all files into ram will trigger the OOM-killer of Linux. Do you have any ideas?
You should derive from list
and override __iter__
method.
import json
def gen():
yield 20
yield 30
yield 40
class StreamArray(list):
def __iter__(self):
return gen()
# according to the comment below
def __len__(self):
return 1
a = [1,2,3]
b = StreamArray()
print(json.dumps([1,a,b]))
Result is [1, [1, 2, 3], [20, 30, 40]]
.