Search code examples
pythonpandasdataframejupyter-notebookcalculated-columns

Using Python - How can I create a new column ("new_col") by returning the value of "colA" if "colA" can be found in "colB"


I am stuck on a project. I am trying to create a new unique column by checking two columns (A & B), if the values in A exist anywhere in B, or the value of B exist anywhere in A return that value, else return an "". For example, I have;

    colA colB
0    x     
1    y     
2         c
3         d
4         x
5    d     
6          

After comparing colA and colB for the first time, I am expecting something like this;

  colA colB new_colA
0    x             x
1    y             y
2         c         
3         d        d
4         x        x
5    d             d
6                   

And the for the second time;

  colA colB new_colA new_colB
0    x             x         
1    y             y         
2         c                 c
3         d        d         
4         x        x         
5    d             d         
6                            

I don't know how to go about it using python. I tried excel where I just used conditional formatting to highlight duplicates.


Solution

  • If you have NaNs in empty cells, you can use:

    m = df['colB'].isin(df['colA'])
    df['new_colA'] = df['colB'].where(m).fillna(df['colA'])
    df['new_colB'] = df['colB'].mask(m)
    

    Output:

      colA colB new_colA new_colB
    0    x  NaN        x      NaN
    1    y  NaN        y      NaN
    2  NaN    c      NaN        c
    3  NaN    d        d      NaN
    4  NaN    x        x      NaN
    5    d  NaN        d      NaN
    6  NaN  NaN      NaN      NaN
    

    Variant for empty strings:

    m = df['colB'].isin(df['colA'])&df['colB'].ne('')
    df['new_colA'] = df['colB'].where(m).fillna(df['colA'])
    df['new_colB'] = df['colB'].mask(m).fillna('')
    

    Output:

      colA colB new_colA new_colB
    0    x             x         
    1    y             y         
    2         c                 c
    3         d        d         
    4         x        x         
    5    d             d         
    6