Search code examples
pythonjsontry-except

Iterating though a relatively consistent JSON file using try statements in Python


I have a best practice question relating to iterating over a JSON file in Python using try/except statements.

I have a JSON file that looks like this (vastly simplified for the question):

"results": [
      {
       "listingId":"1"
       "address":"123 Main st"
       "landsize":"190 m2"
      },
      {
       "listingId":"2"
       "address":"345 North st"
       "state":"California"
      }
  ]

As I stated, this is super simplified, (in my actual problem there are about 30 key value pairs I am interested in, and thousands of records) The challenge is, that even though the keys are pretty consistent (it's always around the same 30), occasionally there will be a missing key/value pair.

If one or two or 10 are missing, I will want the rest of the record to be written out, so my approach at the moment is using a try/catch statement for each key value pair, which seems to strike me as a very inefficient way of checking this, and I am sure there is a better way.

My code looks (kind of) like this (which i am sure is not the best way to do this):

for i in range(len(JSON_data["results"])):
   try:
      print "ListingID=" + JSON_data["results"][i]["listingId"]
   except KeyError:
      print "ListingID is unknown"

   try:
      print "Address=" + JSON_data["results"][i]["address"]
   except KeyError:
      print "Address is unknown"

   try:
      print "landsize=" + JSON_data["results"][i]["landsize"]
   except KeyError:
      print "landsize is unknown"

   try:
      print "state =" + JSON_data["results"][i]["state"]
   except KeyError:
      print "state is unknown"

Any advice appreciated!


Solution

  • You can use the dict.get() method to avoid having to catch an exception:

    listing_id = JSON_data["results"][i].get("listingId")
    

    which returns None or a different default, passed in as the second argument. You can also check if the key is present first:

    if 'listingId' in JSON_data["results"][i]:
        # the key is present, do something with the value
    

    Next, you want to not use range() here. You would be much better off looping directly over the results list, so you can directly refer to the dictionary without the whole JSON_data["results"][i] prefix each time:

    for nesteddict in JSON_data["results"]:
        if 'listingId' in nesteddict:
            listing_id = nesteddict['nesteddict']
    

    Next, rather than hard-code all the keys you check, use a loop over a list of keys:

    expected_keys = ['listingId', 'address', 'landsize', ...]
    
    for nesteddict in JSON_data["results"]:
        for key in expected_keys:
            if key not in nesteddict:
                print(key, 'is unknown')
            else:
                value = nesteddict[key]
                print('{} = {}'.format(key, value)
    

    If you don't need to print that a key is missing, then you could also make use of dictionary views, which act as sets. Sets support intersection operations, so you could ask for the intersection between your expected keys and the available keys:

    # note, using a set here now
    expected_keys = {'listingId', 'address', 'landsize', ...}
    
    for nesteddict in JSON_data["results"]:
        for key in nesteddict.keys() & expected_keys:  # get the intersection
            # each key is guaranteed to be in nesteddict
            value = nesteddict[key]
            print('{} = {}'.format(
    

    This for loop only ever deals with keys both in nesteddict and in expected_keys, nothing more.