Search code examples
pythonpandasdataframemultiple-columns

Create "Yes" column according to another column value pandas dataframe


Imagine I have a dataframe with employee IDs, their Contract Number, and the Company they work for. Each employee can have as many contracts as they want for the same company or even for different companies:

ID  Contract Number Company
10000   1           Abc
10000   2           Zxc
10000   3           Abc
10001   1           Zxc
10002   2           Abc
10002   1           Cde
10002   3           Zxc

I need to find a way to identify the company of the contract number "1" per each ID and then create a column "Primary Contract" that would be set to "Yes" if the contract is in the same company as the company of contract number 1 resulting on this dataframe:

ID  Contract Number Company Primary Compay
10000   1            Abc           Yes
10000   2            Zxc           No
10000   3            Abc           Yes
10001   1            Zxc           Yes
10002   2            Abc           No
10002   1            Cde           Yes
10002   3            Zxc           No

What would be the best way to achieve it?


Solution

  • Filter rows with Contract Number is 1, use left join in DataFrame.merge and compare _merge column generated by indicator=True parameter:

    mask = (df.merge(df[df['Contract Number'].eq(1)],
                    how='left', on=['ID','Company'], indicator=True)['_merge'].eq('both'))
    df['Primary Company'] = np.where(mask, 'Yes','No')
    print (df)
          ID  Contract Number Company Primary Company
    0  10000                1     Abc             Yes
    1  10000                2     Zxc              No
    2  10000                3     Abc             Yes
    3  10001                1     Zxc             Yes
    4  10002                2     Abc              No
    5  10002                1     Cde             Yes
    6  10002                3     Zxc              No
    

    Another idea is with compare MultiIndex by Index.isin:

    idx = df[df['Contract Number'].eq(1)].set_index(['ID','Company']).index
    df['Primary Company'] = np.where(df.set_index(['ID','Company']).index.isin(idx),
                                     'Yes','No')
    print (df)
          ID  Contract Number Company Primary Company
    0  10000                1     Abc             Yes
    1  10000                2     Zxc              No
    2  10000                3     Abc             Yes
    3  10001                1     Zxc             Yes
    4  10002                2     Abc              No
    5  10002                1     Cde             Yes
    6  10002                3     Zxc              No