I have a dataframe with the employee emails, manager emails and the manager hierarchy columns. Am trying to get the number of teams that a manager has.
My current dataframe
emp_email mgr_email mgr_hier_01 mgr_hier_02 mgr_hier_03
[email protected] [email protected] [email protected] [email protected] [email protected]
[email protected] [email protected] [email protected] [email protected] [email protected]
[email protected] [email protected] [email protected] [email protected]
[email protected] [email protected] [email protected] [email protected]
[email protected] [email protected] [email protected] [email protected] [email protected]
[email protected] [email protected] [email protected] [email protected]
[email protected] [email protected] [email protected] [email protected]
[email protected] [email protected] [email protected] [email protected] [email protected]
[email protected] [email protected] [email protected] [email protected]
[email protected] [email protected] [email protected] [email protected] [email protected]
[email protected] [email protected] [email protected] [email protected] [email protected]
[email protected] [email protected] [email protected] [email protected]
[email protected] [email protected] [email protected]
[email protected] [email protected] [email protected]
[email protected] [email protected] [email protected] [email protected]
[email protected] [email protected] [email protected]
[email protected] [email protected] [email protected]
[email protected] [email protected] [email protected]
[email protected] [email protected] [email protected] [email protected]
[email protected] [email protected] [email protected]
[email protected] [email protected] [email protected] [email protected]
[email protected] [email protected] [email protected] [email protected]
[email protected] NAN NAN
what I hope to achieve is a column which gives the counts of number of teams a manager has if the employee is a manager. For example, [email protected] has 2 managers reporting to her ([email protected] and [email protected]) so the count of teams under her should be 2. While [email protected] has no managers reporting to him but he is a manager managing 2 individual contributors ([email protected] and [email protected]). So the count of teams under [email protected] should be 1.
emp_email mgr_email mgr_hier_01 mgr_hier_02 mgr_hier_03 num_teams_if_mgr
[email protected] [email protected] [email protected] [email protected] [email protected] 0
[email protected] [email protected] [email protected] [email protected] [email protected] 0
[email protected] [email protected] [email protected] [email protected] 0
[email protected] [email protected] [email protected] [email protected] 0
[email protected] [email protected] [email protected] [email protected] [email protected] 0
[email protected] [email protected] [email protected] [email protected] 0
[email protected] [email protected] [email protected] [email protected] 0
[email protected] [email protected] [email protected] [email protected] [email protected] 0
[email protected] [email protected] [email protected] [email protected] 0
[email protected] [email protected] [email protected] [email protected] [email protected] 0
[email protected] [email protected] [email protected] [email protected] [email protected] 0
[email protected] [email protected] [email protected] [email protected] 1
[email protected] [email protected] [email protected] 2
[email protected] [email protected] [email protected] 1
[email protected] [email protected] [email protected] [email protected] 1
[email protected] [email protected] [email protected] 2
[email protected] [email protected] [email protected] 1
[email protected] [email protected] [email protected] 1
[email protected] [email protected] [email protected] [email protected] 1
[email protected] [email protected] [email protected] 1
[email protected] [email protected] [email protected] [email protected] 1
[email protected] [email protected] [email protected] [email protected] 1
[email protected] NAN NAN 6
So far, I am only able to create the hierarchy columns for the dataframe with the code below. Appreciate any form of assistance.
import networkx as nx
# create graph
G = nx.from_pandas_edgelist(df_hc, source='mgr_email', target='emp_email', create_using=nx.DiGraph)
# find roots (= top managers)
roots = [n for n,d in G.in_degree() if d==0]
# for each employee, find the hierarchy
df_hierarchy = (pd.DataFrame([next((p for root in roots for p in nx.all_simple_paths(G, root, node)), [])[:-1] for node in df_hc['emp_email']], index= df_hc.index).rename(columns=lambda x: f'mgr_hier_{x+1:02d}'))
# join to original DataFrame
df_hc2 = df_hc.join(df_hierarchy)
I don't fully understand your concept of teams, but assuming a team is more than one person, then count the descendants that are not leafs:
leafs = {n for n,d in G.out_degree() if d==0}
d = {n: len(nx.descendants_at_distance(G, n, 1)-leafs)
for n in G.nodes}
df_hc['num_teams_if_mgr'] = df_hc['emp_email'].map(d)
Output:
emp_email mgr_email mgr_hier_01 mgr_hier_02 mgr_hier_03 num_teams_if_mgr
0 [email protected] [email protected] [email protected] [email protected] [email protected] 0
1 [email protected] [email protected] [email protected] [email protected] [email protected] 0
2 [email protected] [email protected] [email protected] [email protected] None 0
3 [email protected] [email protected] [email protected] [email protected] None 0
4 [email protected] [email protected] [email protected] [email protected] [email protected] 0
5 [email protected] [email protected] [email protected] [email protected] None 0
6 [email protected] [email protected] [email protected] [email protected] None 0
7 [email protected] [email protected] [email protected] [email protected] [email protected] 0
8 [email protected] [email protected] [email protected] [email protected] None 0
9 [email protected] [email protected] [email protected] [email protected] [email protected] 0
10 [email protected] [email protected] [email protected] [email protected] [email protected] 0
11 [email protected] [email protected] [email protected] [email protected] None 0
12 [email protected] [email protected] [email protected] None None 1
13 [email protected] [email protected] [email protected] None None 0
14 [email protected] [email protected] [email protected] [email protected] None 0
15 [email protected] [email protected] [email protected] None None 1
16 [email protected] [email protected] [email protected] None None 1
17 [email protected] [email protected] [email protected] None None 0
18 [email protected] [email protected] [email protected] [email protected] None 0
19 [email protected] [email protected] [email protected] None None 0
20 [email protected] [email protected] [email protected] [email protected] None 0
21 [email protected] [email protected] [email protected] [email protected] None 0
22 [email protected] NAN NAN None None 6
If you want to count +1 if the employee is itself not a leaf:
leafs = {n for n,d in G.out_degree() if d==0}
d = {n: len(nx.descendants_at_distance(G, n, 1)-leafs)
+ (not n in leafs) for n in G.nodes}
df_hc['num_teams_if_mgr'] = df_hc['emp_email'].map(d)
Output:
emp_email mgr_email mgr_hier_01 mgr_hier_02 mgr_hier_03 num_teams_if_mgr
0 [email protected] [email protected] [email protected] [email protected] [email protected] 0
1 [email protected] [email protected] [email protected] [email protected] [email protected] 0
2 [email protected] [email protected] [email protected] [email protected] None 0
3 [email protected] [email protected] [email protected] [email protected] None 0
4 [email protected] [email protected] [email protected] [email protected] [email protected] 0
5 [email protected] [email protected] [email protected] [email protected] None 0
6 [email protected] [email protected] [email protected] [email protected] None 0
7 [email protected] [email protected] [email protected] [email protected] [email protected] 0
8 [email protected] [email protected] [email protected] [email protected] None 0
9 [email protected] [email protected] [email protected] [email protected] [email protected] 0
10 [email protected] [email protected] [email protected] [email protected] [email protected] 0
11 [email protected] [email protected] [email protected] [email protected] None 0
12 [email protected] [email protected] [email protected] None None 2
13 [email protected] [email protected] [email protected] None None 1
14 [email protected] [email protected] [email protected] [email protected] None 1
15 [email protected] [email protected] [email protected] None None 2
16 [email protected] [email protected] [email protected] None None 2
17 [email protected] [email protected] [email protected] None None 1
18 [email protected] [email protected] [email protected] [email protected] None 1
19 [email protected] [email protected] [email protected] None None 1
20 [email protected] [email protected] [email protected] [email protected] None 1
21 [email protected] [email protected] [email protected] [email protected] None 0
22 [email protected] NAN NAN None None 7
Graph: