Search code examples
pythondictionarygraphnestedsetdefault

Avoiding key error storing values in nested dictionary (Python)


Introduction

Following dictionary has three levels of keys and then a value.

d = {
    1:{
        'A':{
            'i': 100,
            'ii': 200
            }, 
        'B':{
            'i': 300
            }
        }, 
    2:{
        'A':{
            'ii': 500
            }
        }
    }

Examples that need to be added.

d[1]['B']['ii'] = 600      # OK
d[2]['C']['iii'] = 700     # Keyerror on 'C'
d[3]['D']['iv'] = 800      # Keyerror on 3

Problem Statement

I wanted to create code that would create the necessary nested keys and avoid any key errors.

Solution 1

The first solution I came up with, was:

def NewEntry_1(d, lv1, lv2, lv3, value):
    if lv1 in d:
        if lv2 in d:
            d[lv1][lv2][lv3] = value
        else:
            d[lv1][lv2] = {lv3: value}
    else:
        d[lv1] = {lv2: {lv3: value}}

Seems legit, but embedding this in order pieces of code made it mind boggling. I explored Stackoverflow for other solutions and was reading on the get() and setdefault() functions.

Solution 2

There is plenty of material to find about get() and setdefault(), but not so much on nested dictionaries. Ultimately I was able to come up with:

def NewEntry_2(d, lv1, lv2, lv3, value):
    return d.setdefault(lv1, {}).setdefault(lv2,{}).setdefault(lv3, value)

It is one line of code so it is not really necessary to make it a function. Easily modifiable to include operations:

d[lv1][lv2][lv3] = d.setdefault(lv1, {}).setdefault(lv2,{}).setdefault(lv3, 0) + value

Seems perfect?

Question

When adding large quantities of entries and doing many modifications, is option 2 better than option 1? Or should I define function 1 and call it? The answers I'm looking should take into account speed and/or potential for errors.

Examples

NewEntry_1(d, 1, 'B', 'ii', 600)
# output = {1: {'A': {'i': 100, 'ii': 200}, 'B': {'i': 300, 'ii': 600}}, 2: {'A': {'ii': 500}}}

NewEntry_1(d, 2, 'C', 'iii', 700)
# output = {1: {'A': {'i': 100, 'ii': 200}, 'B': {'i': 300, 'ii': 600}}, 2: {'A': {'ii': 500}, 'C': {'iii': 700}}}

NewEntry_1(d, 3, 'D', 'iv', 800)
# output = {1: {'A': {'i': 100, 'ii': 200}, 'B': {'i': 300, 'ii': 600}}, 2: {'A': {'ii': 500}, 'C': {'iii': 700}}, 3: {'D': {'iv': 800}}}

More background

I'm a business analyst exploring using Python for creating Graph DB that would help me with very specific analysis. The dictionary structure is used to story the influence one node has on one of its neighbors:

  • lv1 is Node From
  • lv2 is Node To
  • lv3 is Iteration
  • value is Influence (in %)

In the first iteration Node 1 has direct influence on Node 2. In the second iteration Node 1 influences all the Nodes that Node 2 is influencing.

I'm aware of packages that can help me with it (networkx), but I'm trying to understand Python/GraphDB before I want to start using them.


Solution

  • As for the nested dictionaries, you should take a look at defaultdict. Using it will save you a lot of the function-calling overhead. The nested defaultdict construction resorts to lambda functions for their default factories:

    d = defaultdict(lambda: defaultdict(lambda: defaultdict(int)))  # new, shiny, empty
    d[1]['B']['ii'] = 600      # OK
    d[2]['C']['iii'] = 700     # OK
    d[3]['D']['iv'] = 800      # OK
    

    Update: A useful trick to know to create a deeply nested defaultdict is the following:

    def tree():
        return defaultdict(tree)
    
    d = tree()  
    # now any depth is possible
    # d[1][2][3][4][5][6][7][8] = 9