Search code examples
pythonlistdictionaryrecursionnested-loops

Iterate a nested dictionary and filter specific fields


I have an example object which is mixed of lists and dicts:

{
    "field_1" : "aaa",
    "field_2": [
        {
        "name" : "bbb",
          .....
        "field_4" : "ccc",
        "field_need_to_filter" : False,
        },

        {
        "name" : "ddd",
          .....
        "details": [
            {
            "name" : "eee",
            ....
            "details" : [
                {
                "name": "fff",
                .....
                "field_10": {
                    "field_11": "rrr",
                    ...
                    "details": [
                        {
                        "name": "xxx",
                        ...
                        "field_need_to_filter": True,
                        },
                        {
                        "name": "yyy",
                        ...
                        "field_need_to_filter": True,
                        },
                        {
                        "field_13": "zzz",
                        ...
                        "field_need_to_filter": False,
                        }
                                ]
                                }
                },


        ]}]}

       ]
}

I'd like to iterate this dictionary and add all the corresponding fields for name where field_need_to_filter is True, so for this example, expected output would be: ["ddd.eee.fff.xxx", "ddd.eee.fff.yyy"]. I've been looking at this for too long and my brain stops working now, any help would be appreciated. Thanks.


Solution

  • Ok, it took me some time to think about the different cases and fix bugs, but this works (at least on your example of dict); note that it assumes that dicts containing "field_need_to_filter": True are end-points (the function doesn't delve deeper into those)). I'll be glad to add explanations to the code if you want some.

    mydict = {
        "field_1" : "aaa",
        "field_2": [
            {
            "name" : "bbb",
    
            "field_4" : "ccc",
            "field_need_to_filter" : False,
            },
    
            {
            "name" : "ddd",
    
            "details": [
                {
                "name" : "eee",
    
                "details" : [
                    {
                    "name": "fff",
    
                    "field_10": {
                        "field_11": "rrr",
    
                        "details": [
                            {
                            "name": "xxx",
    
                            "field_need_to_filter": True,
                            },
                            {
                            "name": "yyy",
    
                            "field_need_to_filter": True,
                            },
                            {
                            "field_13": "zzz",
    
                            "field_need_to_filter": False,
                            }
                                    ]
                                    }
                    },
    
    
            ]}]}
    
           ]
    }
    
    def filter_paths(thing, path=''):
        if type(thing) == dict:
            # if this dict has a name, log it
            if thing.get("name"):
                path += ('.' if path else '')  + thing["name"]
            # if this dict has "...filter": True, we've reached an end point, and return the path
            if thing.get("field_need_to_filter") and thing["field_need_to_filter"]:
                return [path]
            # else we delve deeper
            result = []
            for key in thing:
                result += [deep_path for deep_path in filter_paths(thing[key], path)]
            return result
        
        # if the current object is a list, we simply delve deeper
        elif type(thing) == list:
            result = []
            for element in thing:
                result += [deep_path for deep_path in filter_paths(element, path)]
            return result
    
        # We've reached a dead-end, so we return an empty list
        else:
            return []
            
    filter_paths(mydict)
    # Out[204]: ['ddd.eee.fff.xxx', 'ddd.eee.fff.yyy']