Search code examples
pythonparsingindexingnested-listsstoring-data

python - parse nested dictionaries in list to store parent & child relationships in new list


I parsed a mvn dependency tree to create a list storing info. I want to be able to go through this list & store in a new list the parent + child combos. An excerpt of how the parsed mvn tree looks is below (using pprint) & I added comments with # to show the relationships more explicitly.

[({'name': '"org.antlr antlr4"'},    #parent1
  {'children': [({'name': '"org.antlr antlr4-runtime"'},    #child1-1
                ({'name': '"org.antlr antlr-runtime"'},    #child1-2
                ({'name': '"org.antlr ST4"'},    #child1-3
                ({'name': '"org.abego.treelayout org.abego.treelayout.core"'},    child1-4 & parent2
                 {'children': [({'name': '"org.hamcrest hamcrest-core"'},   #child2-1
({'name': '"org.slf4j slf4j-log4j12"'},    #parent3
 {'children': [({'name': '"org.apache.commons commons-lang3"'})]    #child3-1

Here's my messy attempt:

def relate(tree):

    for name, subtree in tree.items():
        group, artifact = name.split(":")
        g = "groupId:" + group
        a = "artifactId:" + artifact
        c = {"children": "children"}

    family = []
    parent = name.group + name.artifact
    if subtree:
        for c in subtree:
            child = name.group + name.artifact
        family.append((parent, child))

    return family

Is there a way to iterate through this and return a new list that returns info like shown below?

[[nameParent1, nameChild1-1],
[nameParent1, nameChild1-2],
[nameParent1, nameChild1-3],
[nameParent1, nameChild1-4],
[nameParent2, nameChild2-1],
[nameParent3, nameChild3-1]]

So for this excerpt it would be

[[org.antlr antlr4, org.antlr antlr4-runtime],
[org.antlr antlr4, org.antlr antlr-runtime],
[org.antlr antlr4, org.antlr ST4],
[org.antlr antlr4, org.abego.treelayout org.abego.treelayout.core],
[org.abego.treelayout org.abego.treelayout.core, org.hamcrest hamcrest-core],
[org.slf4j slf4j-log4j12, org.apache.commons commons-lang3]]

I'm unsure of how to iterate through this while keeping track of the relationships & it also has it be general enough to handle any amount of children with children with children (let me know if this needs clarification). Thanks in advance!


**#FINAL CODE -> based off of Michael Bianconi's answer**
def getParentsChildren(mvn: tuple) -> list:
    result = []
    parent = mvn[1]['oid']
    children = mvn[5]['children']
    for child in children:
        result.append([parent, child[1]['oid']])
        if len(child) >= 2:  **# MODIFIED LINE**
            result.extend(getParentsChildren(child))
    return result

def getAll(mvn: list) -> list:
    result = []
    for m in mvn:
        result.extend(getParentsChildren(m))
    return result    **# MODIFIED LINE**

Solution

  • The whole thing is a list of tuples, so loop through. The first item in the tuple is the parent, and the second item is an array of tuples (technically it's a bunch of tuples nested inside each other but I'll assume that's a typo since you never close them).

    def getParentsChildren(mvn: tuple) -> list:
        result = []
        parent = mvn[0]['name']
        children = mvn[1]['children']
    
        for child in children:
            result.append([parent, child[0]['name'])
            if child.length == 2:  # has children
                result.extend(getParentsChildren(child))
    
        return result
    
     def getAll(mvn: list) -> list:
        result = []
        for m in mvn:
            result.extend(getParentsChildren(m))