Search code examples
pythonjsonpython-2.7ijson

Load an element with Python from large json file


Here is my json file. I want to load the data list from it, one by one, and only it. And then, for example plot it...

This is an example, because I am dealing with large data set, with which I could not load all the file (that would create a memory error).

{
  "earth": {
    "europe": [
      {"name": "Paris", "type": "city"},
      {"name": "Thames", "type": "river"}, 
      {"par": 2, "data": [1,7,4,7,5,7,7,6]}, 
      {"par": 2, "data": [1,0,4,1,5,1,1,1]}, 
      {"par": 2, "data": [1,0,0,0,5,0,0,0]}
        ],
    "america": [
      {"name": "Texas", "type": "state"}
    ]
  }
}

Here is what I tried:

import ijson
filename = "testfile.json"

f = open(filename)
mylist = ijson.items(f, 'earth.europe[2].data.item')
print mylist

It returns me nothing, even when I try to convert it into a list:

[]

Solution

  • You need to specify a valid prefix; ijson prefixes are either keys in a dictionary or the word item for list entries. You can't select a specific list item (so [2] doesn't work).

    If you wanted all the data keys dictionaries in the europe list, then the prefix is:

    earth.europe.item.data
    # ^ ------------------- outermost key must be 'earth'
    #       ^ ------------- next key must be 'europe'
    #              ^ ------ any value in the array
    #                   ^   the value for the 'data' key
    

    This produces each such list:

    >>> l = ijson.items(f, 'earth.europe.item.data')
    >>> for data in l:
    ...     print data
    ...
    [1, 7, 4, 7, 5, 7, 7, 6]
    [1, 0, 4, 1, 5, 1, 1, 1]
    [1, 0, 0, 0, 5, 0, 0, 0]
    

    You can't put wildcards in that, so you can't get earth.*.item.data for example.

    If you need to do more complex prefixing matching, you'd have to use the ijson.parse() function and handle the events this produces. You can reuse the ijson.ObjectBuilder() class to turn events you are interested in into Python objects:

    parser = ijson.parse(f)
    for prefix, event, value in parser:
        if event != 'start_array':
            continue
        if prefix.startswith('earth.') and prefix.endswith('.item.data'):
            continent = prefix.split('.', 2)[1]
            builder = ijson.ObjectBuilder()
            builder.event(event, value)
            for nprefix, event, value in parser:
                if (nprefix, event) == (prefix, 'end_array'):
                    break
                builder.event(event, value)
            data = builder.value
            print continent, data
    

    This will print every array that's in a list under a 'data' key (so lives under a prefix that ends with '.item.data'), with the 'earth' key. It also extracts the continent key.