Search code examples
pythonjsondictionaryobjectpath

I am trying to use python's objectpath to pick specific values out of a multi-level json/dictionary, but can't get to my desired target format


Imagine I hit an API and it returns a multi-level json blob. I want to then pull specific values out of that blob and then upload it to a database, so I need to flatten it.

Basically I want to move from something like this:

d1 = {'results': [
        {'a': 1, 'b': 10},
        {'a': 2, 'b': 20},
        {'a': 3, 'b': 30, 'c': {'d': 100, 'e': 1000}},
        {'a': 4, 'c': {'d': 200, 'e': 2000}}
    ]
}

To something like this (ideally with the labels adjusted to represent the original hierarchy):

d2 = [
    {'a': 1, 'b': 10},
    {'a': 2, 'b': 20},
    {'a': 3, 'b': 30, 'c.d': 100},
    {'a': 4, 'c.d': 200}
]

I feel like jsonpath or objectpaths should be able to do this, but I haven't been able to get it to work. I could traverse this example fairly easy, but I have a bunch of these to do so something more "declarative" would be much preferable.

I must be missing something on how these path things work. Here's my attempt:

from objectpath import Tree

# starting here...
d1 = {'results': [
        {'a': 1, 'b': 10},
        {'a': 2, 'b': 20},
        {'a': 3, 'b': 30, 'c': {'d': 100, 'e': 1000}},
        {'a': 4, 'c': {'d': 200, 'e': 2000}}
    ]
}

# trying to get here...
# d2 = [
#     {'a': 1, 'b': 10},
#     {'a': 2, 'b': 20},
#     {'a': 3, 'b': 30, 'c.d': 100},
#     {'a': 4, 'c.d': 200}
# ]

if __name__ == "__main__":
    t = Tree(d1)
    print([x for x in t.execute('$.results.a')])  # works to get value of a
    print([x for x in t.execute('$.results.(a,b)')])  # creates dictionary of a & b -- cool
    print([x for x in t.execute('$.results.(a,b,c)')])  # adds all of c's sub document, makes sense
    print([x for x in t.execute('$.results.(a,b,c.d)')])  # nothing changed?
    print([x for x in t.execute('$.results.*')])  # selects everything, sure
    print([x for x in t.execute('$.results.*["a"]')])  # just "a" value again, makes sense
    print([x for x in t.execute('$.results.*["a" or "b"]')])  # apparently this means HAS "A" or "B" -- weird?
    print([x for x in t.execute('$.results..(a,b,d)')])  # almost works but puts d in it's own dictionary?!
    print([x for x in t.execute('{"a": $.results.a, "b": $.results.b, "c.d":  $.results.c.d}')])  # what I would expect, but not even close

results

[1, 2, 3, 4]
[{'b': 10, 'a': 1}, {'b': 20, 'a': 2}, {'b': 30, 'a': 3}, {'a': 4}]
[{'b': 10, 'a': 1}, {'b': 20, 'a': 2}, {'b': 30, 'c': {'d': 100, 'e': 1000}, 'a': 3}, {'c': {'d': 200, 'e': 2000}, 'a': 4}]
[{'b': 10, 'a': 1}, {'b': 20, 'a': 2}, {'b': 30, 'c': {'d': 100, 'e': 1000}, 'a': 3}, {'c': {'d': 200, 'e': 2000}, 'a': 4}]
[{'b': 10, 'a': 1}, {'b': 20, 'a': 2}, {'b': 30, 'c': {'d': 100, 'e': 1000}, 'a': 3}, {'c': {'d': 200, 'e': 2000}, 'a': 4}]
[1, 2, 3, 4]
[{'b': 10, 'a': 1}, {'b': 20, 'a': 2}, {'b': 30, 'c': {'d': 100, 'e': 1000}, 'a': 3}, {'c': {'d': 200, 'e': 2000}, 'a': 4}]
[{'b': 10, 'a': 1}, {'b': 20, 'a': 2}, {'b': 30, 'a': 3}, {'d': 100}, {'a': 4}, {'d': 200}]
['b', 'a', 'c.d']

I seem to be so close, but maybe i'm doing this completely the wrong way? would something like marshmallow work better? That just seemed like overkill as I'd have to define a class hierarchy. Thanks!


Solution

  • Here is simple recursion:

    from pprint import pprint
    
    
    def flat_dict(d: dict):
        o = {}
        for k, v in d.items():
            if type(v) is dict:
                o.update({
                    k + '.' + key: value
                    for key, value in flat_dict(v).items()
                })
            else:
                o[k] = v
        return o
    
    
    def main():
        d = {
            'result': [
                {'a': 1, 'b': 10},
                {'a': 2, 'b': 20},
                {'a': 3, 'b': 30, 'c': {'d': 100, 'e': 1000}},
                {'a': 4, 'c': {'d': 200, 'e': 2000}}
            ]
        }
    
        res = [
            flat_dict(e)
            for e in d['result']
        ]
        pprint(res)
    
    
    if __name__ == '__main__':
        main()
    

    result:

    [{'a': 1, 'b': 10},
     {'a': 2, 'b': 20},
     {'a': 3, 'b': 30, 'c.d': 100, 'c.e': 1000},
     {'a': 4, 'c.d': 200, 'c.e': 2000}]