I have this data structure, a dict composed by lists of lists (a sort of tree structure), in which the letters are phrases:
dict = {
'A': [ ['A1', ['A2'], ['A3', ['A4'], ['A5'] ] ],
'B': [ ['B1', ['B2'] ],
'C': [ ['C1', ['C2', ['C3'] ], ['C4'] ] ]
}
What I need is a function that returns the 'list of list of elements from a root to a leaf'.
In other words, I need to explore the entire tree following each path, concatenating the phrases reconstructing the conversation.
This tree structure was constructed by crawling Reddit, so each path is a sequence of answers, so a single conversation.
For key 'B' there is 1 possible sequence: ['B1', 'B2'] (the dict key is not relevant).
For key 'A' there are 3 possible sequences: ['A1', 'A2'], ['A1', 'A3', 'A4'], ['A1', 'A3', 'A5']
For key 'C' there are 2 possible sequences: ['C1', 'C2', 'C3'], ['C1', 'C4']
It is very difficult to formalize this problem, I hope these sequences are correct :)
So, what I expect is this output, a list containing ALL the possible sequences in the tree in the form of a list of phrases:
[
['A1', 'A2'],
['A1', 'A3', 'A4'],
['A1', 'A3', 'A5'],
['B1', 'B2'],
['C1', 'C2', 'C3'],
['C1', 'C4']
]
I don't need any particular order in the resulting list, I want only the sequences.
Thank you very much, every help will be very appreciated!
Here is a solution to your problem.
First we make a function to go from the lists you have to a dictionary structure:
def convert_to_dict(conversation_list):
dict_ = {}
for sublist in conversation_list:
dict_[sublist[0]] = convert_to_dict(sublist[1:])
return dict_
print(convert_to_dict([['A1', ['A2'], ['A3', ['A4'], ['A5']]]]))
# {'A1': {'A2': {}, 'A3': {'A4': {}, 'A5': {}}}}
Then we make a function that takes this dictionary and returns the conversations in it:
def conversations(rv, lst, conversation_dict):
for key, value in conversation_dict.items():
if value != {}:
rv = conversations(rv, lst + [key], value)
else:
rv.append(lst + [key])
return rv
print(conversations([], [], {'A1': {'A2': {}, 'A3': {'A4': {}, 'A5': {}}}}))
# [['A1', 'A2'], ['A1', 'A3', 'A4'], ['A1', 'A3', 'A5']]
And then we can combine the two in a function that takes your dict and returns all the conversations:
def get_conversations(dict):
rv = []
for _, value in dict.items():
dict_ = convert_to_dict(value)
rv += conversations([], [], dict_)
return rv
dict_ = {
'A': [ ['A1', ['A2'], ['A3', ['A4'], ['A5'] ] ] ],
'B': [ ['B1', ['B2'] ] ],
'C': [ ['C1', ['C2', ['C3'] ], ['C4'] ] ]
}
print(get_conversations(dict_))
#[['A1', 'A2'], ['A1', 'A3', 'A4'], ['A1', 'A3', 'A5'], ['B1', 'B2'], ['C1', 'C2', 'C3'], ['C1', 'C4']]