Search code examples
pythonjsondictionarydictionary-comprehension

Python - Combining n different json files/dictionaries (n could vary)


I tried search for this particular question with the keywords in the question but could not figure out a good solution.

Say I have a list of JSON files(lets assume the top level is always going to be a dictionary):

"../data/Flickr_EXIF_0.json",
"../data/Flickr_EXIF_150.json",
"../data/Flickr_EXIF_300.json",
"../data/Flickr_EXIF_450.json",

The question is to combine/merge all the json files into one single file.

It would be of course simple to do that given we know how many JSON files we are merging like,

with open("../data/Flickr_EXIF_0.json", "r") as jFl:
    obj1 = json.load(jFl)

with open("../data/Flickr_EXIF_150.json", "r") as jFl:
    obj2 = json.load(jFl) 

with open("../data/Flickr_EXIF_300.json", "r") as jFl:
    obj3 = json.load(jFl) 

with open("../data/Flickr_EXIF_450.json", "r") as jFl:
    obj4 = json.load(jFl) 

d = {**obj1, **obj2, **obj3, **obj4}

But how would you say write a function that can combine an unknown number of JSONs. I am looking for a pythonic solution.

This is my partial solution, which throws an error:

def appendJSON(*inpFl):
    flObjs = []
    for fl in inpFl:
        with open(fl, "r") as jFl:
            flObjs.append(json.load(jFl))

    # something smart here! 
    itemsList = [list(objs.items()) for objs in flObjs]

    return dict(itemsList)

Error:

ValueError Traceback (most recent call last) in () 20 "../data/Flickr_EXIF_1350.json", 21 "../data/Flickr_EXIF_1500.json", ---> 22 "../data/Flickr_EXIF_1650.json")

in appendJSON(*inpFl) 7 itemsList = [objs.items() for objs in flObjs] 8 ----> 9 return dict(itemsList) 10 11 objs = appendJSON("../data/Flickr_EXIF_0.json",

ValueError: dictionary update sequence element #0 has length 150; 2 is required

Sample Debug values for itemsList:

[[('5822864395',
   {'date': '2010-06-10 14:48:25',
    'height': 2592,
    'lat': 0.0,
    'long': 0.0,
    'orientation': 0,
    'width': 2818}),
   ('1458886548',
   {'date': '2007-09-22 02:59:20',
    'height': 768,
    'lat': 39.145372,
    'long': -84.508981,
    'orientation': 0,
    'width': 1024})]]

Alternate solution,

def appendJSON(*inpFl):
    flObjs = []
    for fl in inpFl:
        with open(fl, "r") as jFl:
            flObjs.append(json.load(jFl))

    for i in range(1,len(flObjs)):
        flObjs[0].update(flObjs[i])

    return flObjs[0]

Solution

  • I would first make a generic solution, then optionally optimize if the types of the top-levels of the JSON file are all the same (i.e. all object/dict, or all array/list).

    If you have a mix of top-level types after loading (dict, list, value), you are not going to be able to combine them anyway. You can only combine them if every loaded data is a dict or every loaded is a list. If you have a combination or if you have one or more values at the toplevel, you cannot combine.

    The generic approach is to have create an empty list and .append() the data loaded by json.load() to it, while keeping track of having, dict, list or values:

    def combine(json_file_names):
        combined = []
        have_dict = False
        have_list = False
        for file_name in json_file_names:
            data = json.load(file_name)
            combined.append(data)
            if isinstance(data, dict):
                have_dict = True
            elif isinstance(data, list):
                have_list = True
            else:
                have_list = have_dict = True
    
        # if have_list and have_dict have the same value, either there is nothing 
        # loaded or it's a mixed bag. In both cases you can't do anything
        if have_list == have_dict:  
            return combined
        if have_list:
            tmp = []
            for elem in combined:
                tmp.extend(elem)
        else:  # have_dict
            tmp = {}
            for elem in combined:
                tmp.update(elem)
        return tmp
    

    Please note that when combing all-top-level-dicts you overwrite key-value pairs from previous loaded data.