I have a python script that extracts data from an excel, more precisely, data from three columns: Finished Good, Parent Part Code and Material Code. The three columns look like this:
Material Code Parent Part Code Finished Good
M1 P1 F1
M2 P2 F2
M3 M2 F2
M4 P3 F2
.....
Note: Material Code can also be in the Parent Part Code column.
The tree should look like this:
F1
P1
M1
F2
P2
M2
M3
P3
M4
The Python script is as follows:
import pandas as pd
from anytree import Node, RenderTree
import json
# Read excel
df = pd.read_excel('excelFile.xlsx')
root_dict = {}
for index, row in df.iterrows():
finished_good = row['Finished Good']
parent_part_code = row['Parent Part Code']
material_code = row['Material Code']
if finished_good not in root_dict:
# If not, add as root
root = Node(finished_good)
root_dict[finished_good] = root
else:
# If Finished Good is in the tree, get the root
root = root_dict[finished_good]
if parent_part_code in [node.name for node in root.descendants]:
# If parent node already exists in the tree, find it and add children
parent_node = [node for node in root.descendants if node.name == parent_part_code][0]
if material_code in [node.name for node in root.descendants]:
# If Material Code already exists in the tree, find its sub-tree and copy it to the current node
material_node = [node for node in root.descendants if node.name == material_code][0]
material_node.parent = parent_node
else:
# If Material Code does not exist in the tree, create a new sub-tree
material = Node(material_code, parent=parent_node)
while material_code in df['Parent Part Code'].values:
filtered = df[df['Parent Part Code'] == material_code]
parent_code = filtered.iloc[0]['Parent Part Code']
material_code = filtered.iloc[0]['Material Code']
parent_node = material
material = Node(material_code, parent=parent_node)
else:
# If parent node doesn't exist in the tree, create new sub-tree
parent_node = Node(parent_part_code, parent=root)
material = Node(material_code, parent=parent_node)
while material_code in df['Parent Part Code'].values:
filtered = df[df['Parent Part Code'] == material_code]
parent_code = filtered.iloc[0]['Parent Part Code']
material_code = filtered.iloc[0]['Material Code']
parent_node = material
material = Node(material_code, parent=parent_node)
# Print the trees
for root in root_dict.values():
print(RenderTree(root))
# Save the trees
def node_to_dict(node):
return {
'name': node.name,
'children': [node_to_dict(child) for child in node.children]
}
with open('normTrees.json', 'w') as file:
json.dump({key: node_to_dict(root) for key, root in root_dict.items()}, file)
print("The trees were successfully saved")
The output results are:
Node('/10020115HU')
├── Node('/10020115HU/V371')
│ ├── Node('/10020115HU/V371/YG10-30300')
│ ├── Node('/10020115HU/V371/VECTC002')
│ │ ├── Node('/10020115HU/V371/VECTC002/YG10-30200')
│ │ ├── Node('/10020115HU/V371/VECTC002/VNCTC002')
│ │ │ ├── Node('/10020115HU/V371/VECTC002/VNCTC002/YG10-30300')
│ │ │ ├── Node('/10020115HU/V371/VECTC002/VNCTC002/SZVIZ')
│ │ │ └── Node('/10020115HU/V371/VECTC002/VNCTC002/RVSZALLPOR')
│ │ └── Node('/10020115HU/V371/VECTC002/RVECTC002')
│ └── Node('/10020115HU/V371/U100KOCS')
│ └── Node('/10020115HU/V371/U100KOCS/YG10-30300')
├── Node('/10020115HU/C00211L0')
│ ├── Node('/10020115HU/C00211L0/V371')
│ │ └── Node('/10020115HU/C00211L0/V371/YG10-30300')
│ ├── Node('/10020115HU/C00211L0/RWINNOVERS')
│ └── Node('/10020115HU/C00211L0/RSZENNYEZETT')
├── Node('/10020115HU/10020115HU')
│ └── Node('/10020115HU/10020115HU/TTK00001HU')
├── Node('/10020115HU/D67AMBR910')
│ └── Node('/10020115HU/D67AMBR910/RWINNOVERS')
└── Node('/10020115HU/D67LTRR910')
├── Node('/10020115HU/D67LTRR910/RWINNOVERS')
└── Node('/10020115HU/D67LTRR910/RSECONDUST')
The trees were successfully saved
The script does not work as expected because code V371 appears as a child node for Finished Good 10020115HU and as a sub-node for code C00211L0.
I would need the script to be modified so that it displays the tree as follows:
Node('/10020115HU')
├── Node('/10020115HU/C00211L0')
│ ├── Node('/10020115HU/C00211L0/V371')
│ │ └── Node('/10020115HU/C00211L0/V371/YG10-30300')
| | └── Node('/10020115HU/C00211L0/V371/VECTC002')
| | | └── Node('/10020115HU/C00211L0/V371/VECTC002/YG10-30200')
| └── Node('/10020115HU/C00211L0//V371/VECTC002/VNCTC002')
│ │ | │ ├── Node('/10020115HU/C00211L0/V371/VECTC002/VNCTC002/YG10-30300')
│ │ | │ ├── Node('/10020115HU/C00211L0/V371/VECTC002/VNCTC002/SZVIZ')
│ │ | │ └── Node('10020115HU/C00211L0/V371/VECTC002/VNCTC002/RVSZALLPOR')
│ │ | └── Node('/10020115HU/C00211L0/V371/VECTC002/RVECTC002')
│ | Node('/10020115HU/C00211L0/V371/U100KOCS')
│ | └── Node('/10020115HU/C00211L0/V371/U100KOCS/YG10-30300')
│ ├── Node('/10020115HU/C00211L0/RWINNOVERS')
│ └── Node('/10020115HU/C00211L0/RSZENNYEZETT')
├── Node('/10020115HU/10020115HU')
│ └── Node('/10020115HU/10020115HU/TTK00001HU')
├── Node('/10020115HU/D67AMBR910')
│ └── Node('/10020115HU/D67AMBR910/RWINNOVERS')
└── Node('/10020115HU/D67LTRR910')
├── Node('/10020115HU/D67LTRR910/RWINNOVERS')
└── Node('/10020115HU/D67LTRR910/RSECONDUST')
The problem is that your input may list Parent Part Codes that (much) later occur as Material Code ("V371" is in that case), which means they are not a direct child of the root. If this later occurrence includes a Parent Part Code that is new, then a new node is created (unconditionally) for that Material Code, and so in this case you have two nodes with the name "V371" in your tree.
It can be quite tedious to do this in one pass. I would suggest doing this in two passes:
This looks easier:
root_dict = {}
for sweep in range(2):
for index, row in df.iterrows():
finished_good = row['Finished Good']
parent_part_code = row['Parent Part Code']
material_code = row['Material Code']
if finished_good not in root_dict:
root_dict[finished_good] = Node(finished_good), {}
root, nodes = root_dict[finished_good]
if parent_part_code not in nodes:
nodes[parent_part_code] = Node(parent_part_code, parent=root)
if sweep:
child = nodes[material_code] if material_code in nodes else Node(material_code)
child.parent = nodes[parent_part_code]
After this has executed, the tree is ready to print. As the above code associates a nodes
dictionary for each tree, the values of the root_dict
dictionary are now tuples (pairs), so the loop should extract the root from that tuple:
for root, _ in root_dict.values():
print(RenderTree(root))
Output:
Node('/10020115HU')
├── Node('/10020115HU/C00211L0')
│ ├── Node('/10020115HU/C00211L0/V371')
│ │ ├── Node('/10020115HU/C00211L0/V371/YG10-30300')
│ │ ├── Node('/10020115HU/C00211L0/V371/VECTC002')
│ │ │ ├── Node('/10020115HU/C00211L0/V371/VECTC002/YG10-30200')
│ │ │ ├── Node('/10020115HU/C00211L0/V371/VECTC002/VNCTC002')
│ │ │ │ ├── Node('/10020115HU/C00211L0/V371/VECTC002/VNCTC002/YG10-30300')
│ │ │ │ ├── Node('/10020115HU/C00211L0/V371/VECTC002/VNCTC002/SZVIZ')
│ │ │ │ └── Node('/10020115HU/C00211L0/V371/VECTC002/VNCTC002/RVSZALLPOR')
│ │ │ └── Node('/10020115HU/C00211L0/V371/VECTC002/RVECTC002')
│ │ ├── Node('/10020115HU/C00211L0/V371/U100KOCS')
│ │ │ └── Node('/10020115HU/C00211L0/V371/U100KOCS/YG10-30300')
│ │ └── Node('/10020115HU/C00211L0/V371/YG10-30300')
│ ├── Node('/10020115HU/C00211L0/RWINNOVERS')
│ └── Node('/10020115HU/C00211L0/RSZENNYEZETT')
├── Node('/10020115HU/10020115HU')
│ └── Node('/10020115HU/10020115HU/TTK00001HU')
├── Node('/10020115HU/D67AMBR910')
│ └── Node('/10020115HU/D67AMBR910/RWINNOVERS')
└── Node('/10020115HU/D67LTRR910')
├── Node('/10020115HU/D67LTRR910/RWINNOVERS')
└── Node('/10020115HU/D67LTRR910/RSECONDUST')