Search code examples
pythonlistdictionarysearchdepth

I need a python function to depth first search into a dict of lists of lists


I have this data structure, a dict composed by lists of lists (a sort of tree structure), in which the letters are phrases:

dict = {
   'A': [ ['A1', ['A2'], ['A3', ['A4'], ['A5'] ] ],
   'B': [ ['B1', ['B2'] ],
   'C': [ ['C1', ['C2', ['C3'] ], ['C4'] ] ]
}

What I need is a function that returns the 'list of list of elements from a root to a leaf'.

In other words, I need to explore the entire tree following each path, concatenating the phrases reconstructing the conversation.

This tree structure was constructed by crawling Reddit, so each path is a sequence of answers, so a single conversation.

For key 'B' there is 1 possible sequence: ['B1', 'B2'] (the dict key is not relevant).

For key 'A' there are 3 possible sequences: ['A1', 'A2'], ['A1', 'A3', 'A4'], ['A1', 'A3', 'A5']

For key 'C' there are 2 possible sequences: ['C1', 'C2', 'C3'], ['C1', 'C4']

It is very difficult to formalize this problem, I hope these sequences are correct :)

So, what I expect is this output, a list containing ALL the possible sequences in the tree in the form of a list of phrases:

[
   ['A1', 'A2'],
   ['A1', 'A3', 'A4'],
   ['A1', 'A3', 'A5'],
   ['B1', 'B2'],
   ['C1', 'C2', 'C3'],
   ['C1', 'C4']
]

I don't need any particular order in the resulting list, I want only the sequences.

Thank you very much, every help will be very appreciated!


Solution

  • Here is a solution to your problem.

    First we make a function to go from the lists you have to a dictionary structure:

    def convert_to_dict(conversation_list):
        dict_ = {}
        for sublist in conversation_list:
            dict_[sublist[0]] = convert_to_dict(sublist[1:])
        return dict_
    
    print(convert_to_dict([['A1', ['A2'], ['A3', ['A4'], ['A5']]]]))
    # {'A1': {'A2': {}, 'A3': {'A4': {}, 'A5': {}}}}
    

    Then we make a function that takes this dictionary and returns the conversations in it:

    def conversations(rv, lst, conversation_dict):
        for key, value in conversation_dict.items():
            if value != {}:
                rv = conversations(rv, lst + [key], value)
            else:
                rv.append(lst + [key])
        return rv
    
    print(conversations([], [], {'A1': {'A2': {}, 'A3': {'A4': {}, 'A5': {}}}}))
    # [['A1', 'A2'], ['A1', 'A3', 'A4'], ['A1', 'A3', 'A5']]
    

    And then we can combine the two in a function that takes your dict and returns all the conversations:

    def get_conversations(dict):
        rv = []
        for _, value in dict.items():
            dict_ = convert_to_dict(value)
            rv += conversations([], [], dict_)
        return rv
    
    dict_ = {
       'A': [ ['A1', ['A2'], ['A3', ['A4'], ['A5'] ] ] ],
       'B': [ ['B1', ['B2'] ] ],
       'C': [ ['C1', ['C2', ['C3'] ], ['C4'] ] ]
    }
    print(get_conversations(dict_))
    #[['A1', 'A2'], ['A1', 'A3', 'A4'], ['A1', 'A3', 'A5'], ['B1', 'B2'], ['C1', 'C2', 'C3'], ['C1', 'C4']]