Search code examples
jsonpython-3.6defaultdict

how to remove duplicates from a json defaultdict?


(Re-post with accurate data sample)

I have a json dictionary where each value in turn is a defaultdict as follows:

"Parent_Key_A": [{"a": 1.0, "b": 2.0}, {"a": 5.1, "c": 10}, {"b": 20.3, "a": 1.0}] I am trying to remove both duplicate keys and values so that each element of the json has unique values. So for the above example, I am looking for output something like this:

"Parent_Key_A": {"a":[1.0,5.1], "b":[2.0,20.3], "c":[10]} Then I need to write this output to a json file. I tried using set to handle duplicates but set is not json serializable.

Any suggestions on how to handle this?


Solution

  • The solution using itertools.chain() and itertools.groupby() functions:

    import itertools, json
    
    input_d = { "Parent_Key_A": [{"a": 1.0, "b": 2.0}, {"a": 5.1, "c": 10}, {"b": 20.3, "a": 1.0}] }    
    
    items = itertools.chain.from_iterable(list(d.items()) for d in input_d["Parent_Key_A"])
    # dict comprehension (updated syntax here)
    input_d["Parent_Key_A"] = { k:[i[1] for i in sorted(set(g))] 
                               for k,g in itertools.groupby(sorted(items), key=lambda x: x[0]) }   
    print(input_d)
    

    The output:

    {'Parent_Key_A': {'a': [1.0, 5.1], 'b': [2.0, 20.3], 'c': [10]}}
    

    Printing to json file:

    json.dump(input_d, open('output.json', 'w+'), indent=4)
    

    output.json contents:

    {
        "Parent_Key_A": {
            "a": [
                1.0,
                5.1
            ],
            "c": [
                10
            ],
            "b": [
                2.0,
                20.3
            ]
        }
    }