Search code examples
pythondictionary

Flat a python dict containing lists


I am trying to normalize a dictionary containing some lists. As an MVCE (Minimal, Verifiable, Complete Example), consider the following dictionary:

test_dict = {
    'name' : 'john',
    'age' : 20,
    'addresses' : [
        {
            'street': 'XXX',
            'number': 123,
            'complement' : [
                'HOUSE',
                'NEAR MARKET'
            ]
        },
        {
            'street': 'YYY',
            'number': 456,
            'complement' : [
                'AP',
                'NEAR PARK'
            ]
        },
    ],
    'phones' : [
        '123456'
    ],
    'gender' : 'MASC'
}

I want each list found in the dictionary to generate a line, so my desired output is:

{'name': 'john', 'age': 20, 'street': 'XXX', 'number': 123, 'complement': 'HOUSE', 'phones': '123456', 'gender' : 'MASC'}
{'name': 'john', 'age': 20, 'street': 'XXX', 'number': 123, 'complement': 'NEAR MARKET', 'phones': '123456', 'gender' : 'MASC'}
{'name': 'john', 'age': 20, 'street': 'YYY', 'number': 456, 'complement': 'AP', 'phones': '123456', 'gender' : 'MASC'}
{'name': 'john', 'age': 20, 'street': 'YYY', 'number': 456, 'complement': 'NEAR PARK', 'phones': '123456', 'gender' : 'MASC'}

However, when I run my code, I am not able to iterate over more than one list. My intention was to develop a recursive function, so I wouldn't have to worry about a dictionary with more complex structures (a dictionary with more lists inside dictionaries, etc.). However, when I run my code, the output I get is:

{'name': 'john', 'age': 20, 'street': 'XXX', 'number': 123, 'complement': 'HOUSE'}
{'name': 'john', 'age': 20, 'street': 'XXX', 'number': 123, 'complement': 'NEAR MARKET'}
{'name': 'john', 'age': 20, 'street': 'YYY', 'number': 456, 'complement': 'AP'}
{'name': 'john', 'age': 20, 'street': 'YYY', 'number': 456, 'complement': 'NEAR PARK'}
{'name': 'john', 'age': 20, 'phones': '123456'}

My python code (MVCE):

def get_list_values(lista, dicionario, key_name, results):
    if len(lista) > 0:
        for l in lista:
            if isinstance(l, dict):
                search_values(l, dicionario.copy(), results)
            else:
                dicionario_metodo = dicionario.copy()
                dicionario_metodo[key_name] = l
                results.append(dicionario_metodo)

def search_values(dicionario, test, results):
    for k, v in dicionario.items():
        if isinstance(v, list):
            get_list_values(v, test, k, results )
        else: 
            test[k] = v
    if not any(isinstance(v, list) for v in dicionario.values()):
        results.append(test.copy())
    return results 


test = {}
results = []
for r in search_values(test_dict, test, results):
    print(r)

In which part of my recursion am I going wrong, so it doesn't generate my desired output?


Edit 1:

test_dict = {
    'name' : 'john',
    'age' : 20,
    'addresses' : [
        {
            'street': 'XXX',
            'number': 123,
            'complement' : [
                'HOUSE',
                'NEAR MARKET'
            ]
        },
        {
            'street': 'YYY',
            'number': 456,
            'complement' : [
                'AP',
                'NEAR PARK'
            ]
        },
    ],
    'type' : {
        'category': 'G123',
        'products': [
            'test1',
            'test2'
        ]
    },
    'phones' : [
        '123456'
    ],
    'gender' : 'MASC'
}

Solution

  • It took me some time to get this right, but check this out.

    def flat(out, *kvs):
        match kvs:
            case []: yield out
            case (k, []), *kvs: yield from flat(out, *kvs)
            case (k, list(l)), *kvs: 
                for v in l: yield from flat(out, (k, v), *kvs)
            case (_, dict(d)), *kvs: yield from flat(out, *d.items(), *kvs)
            case (k, v), *kvs: yield from flat([*out, (k, v)], *kvs)
            case _: raise ValueError("Invalid")
    

    That is all you need! This implementation makes extensive use of recursion, pattern mathcing and generators.

    You can try it out like this:

    x = map(dict, flat([], (..., test_dict)))
    print(*x, sep='\n')
    
    # {'name': 'john', 'age': 20, 'street': 'XXX', 'number': 123, 'complement': 'HOUSE', 'phones': '123456', 'gender': 'MASC'}
    # {'name': 'john', 'age': 20, 'street': 'XXX', 'number': 123, 'complement': 'NEAR MARKET', 'phones': '123456', 'gender': 'MASC'}
    # {'name': 'john', 'age': 20, 'street': 'YYY', 'number': 456, 'complement': 'AP', 'phones': '123456', 'gender': 'MASC'}
    # {'name': 'john', 'age': 20, 'street': 'YYY', 'number': 456, 'complement': 'NEAR PARK', 'phones': '123456', 'gender': 'MASC'}
    

    With your second input data, result would be as below:

    # {'name': 'john', 'age': 20, 'street': 'XXX', 'number': 123, 'complement': 'HOUSE', 'category': 'G123', 'products': 'test1', 'phones': '123456', 'gender': 'MASC'}
    # {'name': 'john', 'age': 20, 'street': 'XXX', 'number': 123, 'complement': 'HOUSE', 'category': 'G123', 'products': 'test2', 'phones': '123456', 'gender': 'MASC'}
    # {'name': 'john', 'age': 20, 'street': 'XXX', 'number': 123, 'complement': 'NEAR MARKET', 'category': 'G123', 'products': 'test1', 'phones': '123456', 'gender': 'MASC'}
    # {'name': 'john', 'age': 20, 'street': 'XXX', 'number': 123, 'complement': 'NEAR MARKET', 'category': 'G123', 'products': 'test2', 'phones': '123456', 'gender': 'MASC'}
    # {'name': 'john', 'age': 20, 'street': 'YYY', 'number': 456, 'complement': 'AP', 'category': 'G123', 'products': 'test1', 'phones': '123456', 'gender': 'MASC'}
    # {'name': 'john', 'age': 20, 'street': 'YYY', 'number': 456, 'complement': 'AP', 'category': 'G123', 'products': 'test2', 'phones': '123456', 'gender': 'MASC'}
    # {'name': 'john', 'age': 20, 'street': 'YYY', 'number': 456, 'complement': 'NEAR PARK', 'category': 'G123', 'products': 'test1', 'phones': '123456', 'gender': 'MASC'}
    # {'name': 'john', 'age': 20, 'street': 'YYY', 'number': 456, 'complement': 'NEAR PARK', 'category': 'G123', 'products': 'test2', 'phones': '123456', 'gender': 'MASC'}
    

    Edit: Mapped the key-value pairs into dicts as per requirements, made the code neatier.