Search code examples
pythonpandasmergelines-of-code

Create extra column with conditional pandas dataframe


I have a dataframe with employee IDs, the job they perform, the location the job is performed and a column that designates if that job is the primary job of that employee, like this one:

ID  Job     Location        Main Job?
10  Driver  Peru street     Yes
11  Nurse   Jorge street    No
11  Helper  Jorge street    Yes
11  Driver  Phoenix av.     No
11  Admin   Jorge street    No
12  Driver  Yard av.        Yes
12  Helper  Yard av.        No
12  Nurse   Jorge street    No
13  Driver  Peru street     Yes
13  Helper  Jorge street    No

As you can see, an employee may perform different jobs on the same Location but only one job is marked as "Yes" in the "Main Job?" column by each unique "ID".

I need now to create a column named "Primary Location?" which always mark as "Yes" if the job performed by each employee (represented by the unique ID) is performed on the Location the "Main Job" also occurs (and setting as No if do not occur in the Location of the main job). This way resulting on the following dataframe:

ID  Job     Location       Main Job?    Primary Location?
10  Driver  Peru street     Yes         Yes
11  Nurse   Jorge street    No          Yes
11  Helper  Jorge street    Yes         Yes
11  Driver  Phoenix av.     No          No
11  Admin   Jorge street    No          Yes
12  Driver  Yard av.        Yes         Yes
12  Helper  Yard av.        No          Yes
12  Nurse   Jorge street    No          No
13  Driver  Peru street     Yes         Yes
13  Helper  Jorge street    No          No

What is the best way to achieve this logic?

Thank you!


Solution

  • You need to do the following:

    import pandas as pd
    
    df = pd.DataFrame({
      'ID': [10, 11, 11, 11, 11, 12, 12, 12, 13, 13],
      'Job': ['Driver', 'Nurse', 'Helper', 'Driver', 'Admin', 'Driver', 'Helper', 'Nurse', 'Driver', 'Helper'],
      'Location': ['Peru street', 'Jorge street', 'Jorge street', 'Phoenix av.', 'Jorge street', 'Yard av.', 'Yard av.', 'Jorge street', 'Peru street', 'Jorge street'],
      'Main Job?': ['Yes', 'No', 'Yes', 'No', 'No', 'Yes', 'No', 'No', 'Yes', 'No']
    })
    
    # Filter Main Locations
    main_jobs = df.loc[df['Main Job?'] == 'Yes'][['ID', 'Location']].rename(columns = {'Location': 'Main Location'})
    
    # Add Main Location Column
    df = df.merge(main_jobs, on=['ID'], how='left')
    
    # Build Primary Location Column
    df['Primary Location?'] = df['Location'] == df['Main Location']
    
    # Convert True/False to String
    df['Primary Location?'] = np.where(df['Primary Location?'], 'Yes', 'No')
    
    # Select Relevant columns
    df = df[['ID', 'Job', 'Location', 'Main Job?', 'Primary Location?']]
    print(df)
    

    Outputs:

    >>>
       ID     Job      Location Main Job? Primary Location?
    0  10  Driver   Peru street       Yes               Yes
    1  11   Nurse  Jorge street        No               Yes
    2  11  Helper  Jorge street       Yes               Yes
    3  11  Driver   Phoenix av.        No                No
    4  11   Admin  Jorge street        No               Yes
    5  12  Driver      Yard av.       Yes               Yes
    6  12  Helper      Yard av.        No               Yes
    7  12   Nurse  Jorge street        No                No
    8  13  Driver   Peru street       Yes               Yes
    9  13  Helper  Jorge street        No                No
    

    Hope it helps! :)