Search code examples
pythonpandasdataframegroup-byhierarchy

Python - How to get number of teams under a manager using manager hierarchy columns


I have a dataframe with the employee emails, manager emails and the manager hierarchy columns. Am trying to get the number of teams that a manager has.

My current dataframe

emp_email        mgr_email       mgr_hier_01    mgr_hier_02     mgr_hier_03            
[email protected]     [email protected]    [email protected]    [email protected]   [email protected]     
[email protected]     [email protected]    [email protected]    [email protected]   [email protected]     
[email protected]    [email protected]   [email protected]    [email protected]
[email protected]    [email protected]   [email protected]    [email protected]                                       
[email protected]      [email protected]   [email protected]    [email protected]    [email protected]    
[email protected]    [email protected]    [email protected]    [email protected]                     
[email protected]     [email protected]    [email protected]    [email protected]                     
[email protected]   [email protected]  [email protected]    [email protected]    [email protected]   
[email protected]  [email protected]    [email protected]    [email protected]                     
[email protected]     [email protected]    [email protected]    [email protected]    [email protected]     
[email protected]   [email protected]   [email protected]    [email protected]   [email protected]    
[email protected]     [email protected]    [email protected]    [email protected]                                    
[email protected]    [email protected]     [email protected]                                     
[email protected]    [email protected]     [email protected]                                     
[email protected]     [email protected]   [email protected]    [email protected]                    
[email protected]     [email protected]     [email protected]                                     
[email protected]     [email protected]     [email protected]                                     
[email protected]     [email protected]     [email protected]                                     
[email protected]    [email protected]    [email protected]    [email protected]                      
[email protected]     [email protected]     [email protected]                                     
[email protected]   [email protected]    [email protected]    [email protected]
[email protected]    [email protected]   [email protected]    [email protected]                    
[email protected]      NAN             NAN                                             

what I hope to achieve is a column which gives the counts of number of teams a manager has if the employee is a manager. For example, [email protected] has 2 managers reporting to her ([email protected] and [email protected]) so the count of teams under her should be 2. While [email protected] has no managers reporting to him but he is a manager managing 2 individual contributors ([email protected] and [email protected]). So the count of teams under [email protected] should be 1.

emp_email        mgr_email       mgr_hier_01    mgr_hier_02     mgr_hier_03      num_teams_if_mgr     
[email protected]     [email protected]    [email protected]    [email protected]   [email protected]     0
[email protected]     [email protected]    [email protected]    [email protected]   [email protected]     0
[email protected]    [email protected]   [email protected]    [email protected]                    0
[email protected]    [email protected]   [email protected]    [email protected]                    0
[email protected]      [email protected]   [email protected]    [email protected]    [email protected]    0
[email protected]    [email protected]    [email protected]    [email protected]                     0
[email protected]     [email protected]    [email protected]    [email protected]                     0
[email protected]   [email protected]  [email protected]    [email protected]    [email protected]   0
[email protected]  [email protected]    [email protected]    [email protected]                     0
[email protected]     [email protected]   [email protected]    [email protected]    [email protected]     0
[email protected]   [email protected]   [email protected]    [email protected]   [email protected]    0
[email protected]     [email protected]    [email protected]    [email protected]                     1                
[email protected]    [email protected]     [email protected]                                     2
[email protected]    [email protected]     [email protected]                                     1
[email protected]     [email protected]   [email protected]    [email protected]                    1
[email protected]     [email protected]     [email protected]                                     2
[email protected]     [email protected]     [email protected]                                     1
[email protected]     [email protected]     [email protected]                                     1 
[email protected]    [email protected]    [email protected]    [email protected]                     1  
[email protected]     [email protected]     [email protected]                                     1
[email protected]   [email protected]    [email protected]    [email protected]                     1  
[email protected]    [email protected]   [email protected]    [email protected]                    1
[email protected]      NAN             NAN                                             6

So far, I am only able to create the hierarchy columns for the dataframe with the code below. Appreciate any form of assistance.

import networkx as nx

# create graph
G = nx.from_pandas_edgelist(df_hc, source='mgr_email', target='emp_email', create_using=nx.DiGraph)

# find roots (= top managers)
roots = [n for n,d in G.in_degree() if d==0]

# for each employee, find the hierarchy 
df_hierarchy = (pd.DataFrame([next((p for root in roots for p in nx.all_simple_paths(G, root, node)), [])[:-1] for node in df_hc['emp_email']], index= df_hc.index).rename(columns=lambda x: f'mgr_hier_{x+1:02d}'))

# join to original DataFrame
df_hc2 = df_hc.join(df_hierarchy)

Solution

  • I don't fully understand your concept of teams, but assuming a team is more than one person, then count the descendants that are not leafs:

    leafs = {n for n,d in G.out_degree() if d==0}
    d = {n: len(nx.descendants_at_distance(G, n, 1)-leafs)
         for n in G.nodes}
    df_hc['num_teams_if_mgr'] = df_hc['emp_email'].map(d)
    

    Output:

              emp_email       mgr_email  mgr_hier_01    mgr_hier_02     mgr_hier_03  num_teams_if_mgr
    0      [email protected]    [email protected]  [email protected]  [email protected]    [email protected]                 0
    1      [email protected]    [email protected]  [email protected]  [email protected]    [email protected]                 0
    2     [email protected]   [email protected]  [email protected]  [email protected]            None                 0
    3     [email protected]   [email protected]  [email protected]  [email protected]            None                 0
    4       [email protected]   [email protected]  [email protected]   [email protected]   [email protected]                 0
    5     [email protected]    [email protected]  [email protected]   [email protected]            None                 0
    6      [email protected]    [email protected]  [email protected]   [email protected]            None                 0
    7    [email protected]  [email protected]  [email protected]   [email protected]  [email protected]                 0
    8   [email protected]    [email protected]  [email protected]   [email protected]            None                 0
    9      [email protected]   [email protected]  [email protected]   [email protected]    [email protected]                 0
    10   [email protected]   [email protected]  [email protected]  [email protected]   [email protected]                 0
    11     [email protected]    [email protected]  [email protected]   [email protected]            None                 0
    12    [email protected]     [email protected]  [email protected]           None            None                 1
    13    [email protected]     [email protected]  [email protected]           None            None                 0
    14     [email protected]   [email protected]  [email protected]  [email protected]            None                 0
    15     [email protected]     [email protected]  [email protected]           None            None                 1
    16     [email protected]     [email protected]  [email protected]           None            None                 1
    17     [email protected]     [email protected]  [email protected]           None            None                 0
    18    [email protected]    [email protected]  [email protected]   [email protected]            None                 0
    19     [email protected]     [email protected]  [email protected]           None            None                 0
    20   [email protected]    [email protected]  [email protected]   [email protected]            None                 0
    21    [email protected]   [email protected]  [email protected]  [email protected]            None                 0
    22      [email protected]             NAN          NAN           None            None                 6
    

    If you want to count +1 if the employee is itself not a leaf:

    leafs = {n for n,d in G.out_degree() if d==0}
    d = {n: len(nx.descendants_at_distance(G, n, 1)-leafs)
            + (not n in leafs) for n in G.nodes}
    df_hc['num_teams_if_mgr'] = df_hc['emp_email'].map(d)
    

    Output:

              emp_email       mgr_email  mgr_hier_01    mgr_hier_02     mgr_hier_03  num_teams_if_mgr
    0      [email protected]    [email protected]  [email protected]  [email protected]    [email protected]                 0
    1      [email protected]    [email protected]  [email protected]  [email protected]    [email protected]                 0
    2     [email protected]   [email protected]  [email protected]  [email protected]            None                 0
    3     [email protected]   [email protected]  [email protected]  [email protected]            None                 0
    4       [email protected]   [email protected]  [email protected]   [email protected]   [email protected]                 0
    5     [email protected]    [email protected]  [email protected]   [email protected]            None                 0
    6      [email protected]    [email protected]  [email protected]   [email protected]            None                 0
    7    [email protected]  [email protected]  [email protected]   [email protected]  [email protected]                 0
    8   [email protected]    [email protected]  [email protected]   [email protected]            None                 0
    9      [email protected]   [email protected]  [email protected]   [email protected]    [email protected]                 0
    10   [email protected]   [email protected]  [email protected]  [email protected]   [email protected]                 0
    11     [email protected]    [email protected]  [email protected]   [email protected]            None                 0
    12    [email protected]     [email protected]  [email protected]           None            None                 2
    13    [email protected]     [email protected]  [email protected]           None            None                 1
    14     [email protected]   [email protected]  [email protected]  [email protected]            None                 1
    15     [email protected]     [email protected]  [email protected]           None            None                 2
    16     [email protected]     [email protected]  [email protected]           None            None                 2
    17     [email protected]     [email protected]  [email protected]           None            None                 1
    18    [email protected]    [email protected]  [email protected]   [email protected]            None                 1
    19     [email protected]     [email protected]  [email protected]           None            None                 1
    20   [email protected]    [email protected]  [email protected]   [email protected]            None                 1
    21    [email protected]   [email protected]  [email protected]  [email protected]            None                 0
    22      [email protected]             NAN          NAN           None            None                 7
    

    Graph:

    enter image description here