Search code examples
pythonjsonlarge-files

How to handle huge JSON files?


My JSON file is 5.5 MB (annotations file of Object365 dataset for object detection models). My Python program can't even read it as a simple text file.

 def ob365_converter(inputJsonFile, datasetPath):
    text = readFileLine(inputJsonFile)
    print("text:")
    print(text[0:100])
    datasetJson = json.loads(text)
    print("dataset loaded")
    for item in datasetJson["annotations"]:
        #Do some operations
        . . .
 
    def readFileLine(filePath):
        p = Path(filePath)
        if not p.is_file():
            print("%s is not a file", filePath)
            return ""
        with open(filePath, "r") as f:
            text = f.readline()
        return text

The output doesn't show even the first message "text:". I also tried with same result the following:

print ("A")
f = open(inputJsonFile, 'r')
datasetJson = json.load(f)
f.close()
print ("B")

How to handle huge JSON files in Python?


Solution

  • The big size of json file requires too much source to be handled. As mentioned in @cizario's link, it should be used some stream logic that access json objects without storing all the content of the file.

    One py library that works in streaming can be found at https://www.npmjs.com/package/stream-json