I'm looking to count how many people are below a given user of the data frame.
Employee | Manager |
---|---|
A | - |
B | A |
C | A |
D | A |
E | A |
F | B |
G | B |
H | C |
I | C |
I would like to get in the output: I, H, G, F, E and D have no employees below C has two employees (H and I) below it B has two employees (F and G) A has eight employees below him (B, C, D and E plus the employees of B and C)
Would anyone have any suggestions? In my DF I have more hierarchy layers and a very large amount of data.
I thought about storing it in a dictionary and doing a loop to update it, but I believe that this solution is not efficient at all. I would like to know if there is any more efficient technique to solve this type of problem.
I would use a directed graph with networkx
. This is a super fun python package.
import networkx as nx, pandas as pd
#set up data
employee = ['A', 'B', 'C','D','E','F','G','H','I']
manager = ['', 'A', 'A','A','A','B','B','C','C']
relations = pd.DataFrame(list(zip(employee,manager)), columns = ['Employee', 'Manager'])
# If there is no manager, make it the employee
relations.Manager = np.where(relations.Manager == '', relations.Employee, relations.Manager)
# or might need depending on data format:
relations.Manager = np.where(relations.Manager.isna(), relations.Employee, relations.Manager)
# Create tuples for 'edges'
relations['edge'] = list(zip(relations.Manager, relations.Employee))
# Create graph
G = nx.DiGraph()
G.add_nodes_from(list(relations.Employee))
G.add_edges_from(list(set(relations.edge)))
#Find all the descendants of nodes/employees
relations['employees_below'] = relations.apply(lambda row: nx.descendants(G,row.Employee), axis = 1)
returns:
Employee Manager edge employees_below
0 A A (A, A) {C, G, I, D, H, F, E, B}
1 B A (A, B) {F, G}
2 C A (A, C) {H, I}
3 D A (A, D) {}
4 E A (A, E) {}
5 F B (B, F) {}
6 G B (B, G) {}
7 H C (C, H) {}
8 I C (C, I) {}
The way it works: graphs are nodes and edges. In this case, your nodes are employees and your edges are a relationship between a manager and an employee. Do a quick google for 'networkx directed graph' images and you'll get the idea of what this looks like in an image representation.
(manager, employee)
and save it somewhere (I chose to make it a column in the df
called edges
).(manager, employee)
discussed previously.descendants
function to the employee in each row with apply
.