Search code examples
pythonappendmultiple-columns

Is there a shorter way to create loops trough the rows when using append?


I have a data frame with employees and all the roles that they are able to do.

`

    Employees   ID  Brand_Manager   Payroll_Manager Accountant  Auditor
0   Jessi           1A         1           0           1      0
1   Lara            1B         1           0           0      1
2   Mike            1C         1           0           0          0
3   Artur           1D         1           0           0      0
4   James           2A         1           0           0      0
5   Claudia         3B         1           0           0          0
6   Zuzska          4C         1           1           0          1
7   Bartz           2B         1           1           0      0
8   Alexa           3B         1           1           0      0

`

To make work the program that I want to apply, I need to split the data and create new rows for the same person for each role (value=1). The rest of the values for the rest of the roles will become 0. The codes work well when using append(), however this example contain only 8 employees and 4 roles. I need to do the same with a lot more employees and almost 100 extra roles, which will create a very long script.

I have done this: First, select all the employees that have more than one role

col_list=df.columns
df['many'] = df[col_list].sum(axis=1)
df_single   =  df[ df['many'] == 1  ]
df_many    =  df[ df['many'] >= 2  ]

Then create lists and append:

lststaff = list ( df_many.Employees)

lstEmployees = []
lstID = []
lstBrand_Manager = []
lstPayroll_Manager = []
lstAccountant = []
lstAuditor = []

Loop through the names

for i in lststaff:


  ID = str ( df_many.loc [ df_many['Employees'] == i, 'ID' ].tolist()[0] ) 


  Brand_Manager = ( df_many.loc [ df_many['Employees'] == i, 'Brand_Manager'].astype(int) )
  Brand_Manager = np.array(Brand_Manager)
  if ( Brand_Manager == 1 ).any():  
      lstEmployees.append ( i + '_Brand_Manager'  )
      lstAccountant.append (0)
      lstBrand_Manager.append(1)
      lstAuditor.append(0)
      lstID.append (ID)
      lstPayroll_Manager.append(0)
  
    
  Accountant = ( df_many.loc [ df_many['Employees'] == i, 'Accountant'].astype(int) )
  Accountant = np.array(Accountant)
  if ( Accountant == 1 ).any():  
      lstEmployees.append ( i + '_Accountant'  )
      lstAccountant.append (1)
      lstBrand_Manager.append(0)
      lstAuditor.append(0)
      lstID.append (ID)
      lstPayroll_Manager.append(0)
    
  Auditor = ( df_many.loc [ df_many['Employees'] == i, 'Auditor'].astype(int) )
  Auditor = np.array(Auditor)
  if ( Auditor == 1 ).any():  
      lstEmployees.append ( i + '_Auditor' )
      lstAccountant.append (0)
      lstBrand_Manager.append(0)
      lstAuditor.append(1)
      lstID.append (ID)
      lstPayroll_Manager.append(0)
    
  Payroll_Manager = ( df_many.loc [ df_many['Employees'] == i, 'Payroll_Manager'].astype(int) )
  Payroll_Manager = np.array(Payroll_Manager)
  if ( Payroll_Manager == 1 ).any():  
      lstEmployees.append ( i + '_Payroll_Manager'  )
      lstAccountant.append (0)
      lstBrand_Manager.append(0)
      lstAuditor.append(0)
      lstID.append (ID)
      lstPayroll_Manager.append(1)
    


final_df = pd.DataFrame ( {
   "Employees"  : lstEmployees
   ,"ID"   : lstID
   ,"Brand_Manager" :lstBrand_Manager
   ,"Accountant" :lstAccountant
   ,"Auditor" : lstAuditor
   ,"Payroll_Manager" : lstPayroll_Manager 
}
)

final_df

The codes works well, however if I have to add 100 more roles. I will add 100 more lists and the amount of lists to be appended would be crazy.... Is there any other way of doing it with a function or for loops?

The output is as follow:

    Employees   ID  Brand_Manager   Accountant  Auditor Payroll_Manager
0   Jessi_Brand_Manager 1A  1   0   0   0
1   Jessi_Accountant    1A  0   1   0   0
2   Lara_Brand_Manager  1B  1   0   0   0
3   Lara_Auditor    1B  0   0   1   0
4   Zuzska_Brand_Manager    4C  1   0   0   0
5   Zuzska_Auditor  4C  0   0   1   0
6   Zuzska_Payroll_Manager  4C  0   0   0   1
7   Bartz_Brand_Manager 2B  1   0   0   0
8   Bartz_Payroll_Manager   2B  0   0   0   1
9   Alexa_Brand_Manager 3B  1   0   0   0
10  Alexa_Payroll_Manager   3B  0   0   0   1

Solution

  • Here's one approach. pandas gives a "SettingWithCopy" warning, but I believe this script will always result in the expected behavior.

    import numpy as np
    import pandas as pd
    
    #input dataframe
    df = pd.DataFrame({'Employees': {0: 'Jessi', 1: 'Lara', 2: 'Mike', 3: 'Artur', 4: 'James', 5: 'Claudia', 6: 'Zuzska', 7: 'Bartz', 8: 'Alexa'}, 'ID': {0: '1A', 1: '1B', 2: '1C', 3: '1D', 4: '2A', 5: '3B', 6: '4C', 7: '2B', 8: '3B'}, 'Brand_Manager': {0: 1, 1: 1, 2: 1, 3: 1, 4: 1, 5: 1, 6: 1, 7: 1, 8: 1}, 'Payroll_Manager': {0: 0, 1: 0, 2: 0, 3: 0, 4: 0, 5: 0, 6: 1, 7: 1, 8: 1}, 'Accountant': {0: 1, 1: 0, 2: 0, 3: 0, 4: 0, 5: 0, 6: 0, 7: 0, 8: 0}, 'Auditor': {0: 0, 1: 1, 2: 0, 3: 0, 4: 0, 5: 0, 6: 1, 7: 0, 8: 0}})
    
    # isolate job columns
    job_cols = df.columns.drop(['Employees','ID'])
    num_jobs = len(job_cols)
    
    # drop employees with only one job
    df_in = df.loc[df[job_cols].sum(axis = 1)>=2,:]
    
    # information processing
    def get_nums(row):
        return np.eye(num_jobs, dtype = int)[row.astype(bool)]
    def get_labels(row):
        return job_cols[row.astype(bool)]
    
    df_proc = df_in[['Employees','ID']]
    df_proc.loc[:,'all'] = df_in[job_cols].apply(get_nums,axis = 1)
    df_proc = df_proc.explode('all').reset_index(drop=True)
    df_proc[job_cols] = np.vstack(df_proc['all'])
    df_proc['jobs'] = df_in[job_cols].apply(get_labels,axis = 1).explode().reset_index(drop=True)
    
    # generate output dataframe
    df_out = df_proc[['Employees','ID',*job_cols]]
    df_out.loc[:,'Employees'] = df_proc['Employees'] + '_' + df_proc['jobs']
    

    Resulting dataframe df_out:

                     Employees  ID  Brand_Manager  Payroll_Manager  Accountant  \
    0      Jessi_Brand_Manager  1A              1                0           0   
    1         Jessi_Accountant  1A              0                0           1   
    2       Lara_Brand_Manager  1B              1                0           0   
    3             Lara_Auditor  1B              0                0           0   
    4     Zuzska_Brand_Manager  4C              1                0           0   
    5   Zuzska_Payroll_Manager  4C              0                1           0   
    6           Zuzska_Auditor  4C              0                0           0   
    7      Bartz_Brand_Manager  2B              1                0           0   
    8    Bartz_Payroll_Manager  2B              0                1           0   
    9      Alexa_Brand_Manager  3B              1                0           0   
    10   Alexa_Payroll_Manager  3B              0                1           0   
    
        Auditor  
    0         0  
    1         0  
    2         0  
    3         1  
    4         0  
    5         0  
    6         1  
    7         0  
    8         0  
    9         0  
    10        0 
    

    Intermediate dataframe df_proc before generating df_out, for reference:

       Employees  ID           all  Brand_Manager  Payroll_Manager  Accountant  \
    0      Jessi  1A  [1, 0, 0, 0]              1                0           0   
    1      Jessi  1A  [0, 0, 1, 0]              0                0           1   
    2       Lara  1B  [1, 0, 0, 0]              1                0           0   
    3       Lara  1B  [0, 0, 0, 1]              0                0           0   
    4     Zuzska  4C  [1, 0, 0, 0]              1                0           0   
    5     Zuzska  4C  [0, 1, 0, 0]              0                1           0   
    6     Zuzska  4C  [0, 0, 0, 1]              0                0           0   
    7      Bartz  2B  [1, 0, 0, 0]              1                0           0   
    8      Bartz  2B  [0, 1, 0, 0]              0                1           0   
    9      Alexa  3B  [1, 0, 0, 0]              1                0           0   
    10     Alexa  3B  [0, 1, 0, 0]              0                1           0   
    
        Auditor             jobs  
    0         0    Brand_Manager  
    1         0       Accountant  
    2         0    Brand_Manager  
    3         1          Auditor  
    4         0    Brand_Manager  
    5         0  Payroll_Manager  
    6         1          Auditor  
    7         0    Brand_Manager  
    8         0  Payroll_Manager  
    9         0    Brand_Manager  
    10        0  Payroll_Manager