I am iterating a complex list in python with this format:
[
{
<parent id>: [
<child id>,
<child id>
],
<parent id>: [
<child id>
]
},
{
<parent id>: [
<child id>,
<child id>,
<child id>
],
<parent id>: [
<child id>
]
}
]
The list will have dict as elements. These dict have keys of <parent id>
and values of a list of <child id>
There can be the same <parent id>
in different dict, but <child id>
can only belong to one <parent id>
. An example is this:
[
{
2: [1, 5],
3: [3, 7],
4: [6]
},
{
1: [2, 4, 8],
4: [6]
}
]
Parent id 4
is in both dict elements but all child ids are unique to a parent id.
Now I am iterating this data structure as input because I want to make sure that the condition of all childs being unique to a parent id is met. This is my code:
def check_format(self, mapping):
# mapping is the data structure
unique_parent_list = []
child_list = []
for x in range(0, 2):
for parent in mapping[x].keys():
if parent not in unique_parent_list:
unique_parent_list.append(parent)
for child in mapping[x][parent]:
child_list.append(child)
if len(child_list) > len(set(child_list)):
return 'Condition not met'
else:
return 'Condition met'
This works, but I do not like that it is O^4 complexity or something like that. Is there a way to simplify or code this for better performance?
You clearly have a mapping relationship from child to parent. The easiest thing I can think of is just to make a dict with children as keys. If you encounter a child that's already in, check the parent value.
The lookup and insertion happen in constant time (dict keys are effectively a hash set). You can also use short circuiting more effectively since you can stop the moment you find a child with multiple parents:
def check_format(map_list):
check = {}
for parent, children in (i for d in map_list for i in d.items()):
for child in children:
if check.setdefault(child, parent) != parent:
return False
return True
This will iterate exactly once per child and perform a constant-time (ideally) operation on each using dict.setdefault
.