Search code examples
pythonpandasdataframeassign

Assign column with conditional values based on strings contained in other columns


I am trying to assign a column based on strings that may be contained in other columns. For example

var1 = 67

columns = {'col1': ['string1', 'thang2', 'code3', 'string2'], 
          'col2': [1, 2, np.nan, 3], 'col3': ['I', 'cant', 'think', 'what']}

df = pd.DataFrame(data = columns)

How do I then make a fourth column col4 that is col3 + var1 + col1 most of the time, but is np.nan whenever col2 is nan (in the same row) and has a -W appended to its value whenever there is an 'in' in any the string in col1 (again, in the same row)?

I know all about assign, but I don't know how to do all that conditional stuff in the assign, or if there is a way to do it after creating the column, I'm not sure either.


Solution

  • You can try this using np.where:

    df['col4'] = np.where(df['col2'].notnull(),
                          df['col3'] + str(var1) + np.where(df['col1'].str.contains('in'),
                                                            df['col1'] + '-w',
                                                            df['col1']), 
                          np.nan)
    

    Output:

          col1  col2   col3             col4
    0  string1   1.0      I     I67string1-w
    1   thang2   2.0   cant     cant67thang2
    2    code3   NaN  think              NaN
    3  string2   3.0   what  what67string2-w
    

    Or if you want to do it with assign:

    df.assign(col5 = np.where(df['col2'].notnull(),
             df['col3'] + str(var1) + np.where(df['col1'].str.contains('in'),
                                               df['col1'] + '-w',
                                               df['col1']), 
             np.nan))
    

    Output:

          col1  col2   col3             col4             col5
    0  string1   1.0      I     I67string1-w     I67string1-w
    1   thang2   2.0   cant     cant67thang2     cant67thang2
    2    code3   NaN  think              NaN              NaN
    3  string2   3.0   what  what67string2-w  what67string2-w
    

    Update: Since you mentioned speed. I think I'd remove the .str accessor and use list comprehension too.

    df['col4'] = np.where(df['col2'].notnull(),
             df['col3'] + str(var1) + np.where(['in' in i for i in df['col1']], 
                                               df['col1'] + '-w', 
                                               df['col1']), 
             np.nan)