Search code examples
pythonpandasdataframeloopsif-statement

What tool should I use to create new column in dataframe based on multiple condition


I have column (Column1) in dataframe that contain information about work centers and values that I need to assign names of these work centers in empty column (Column2), example:

Input:

Column1: Column2:
WC1 NaN
123 NaN
456 NaN
789 NaN
abc NaN
fgh NaN
WC2 NaN
123 NaN
abc NaN
123 NaN

Output:

Column1: Column2:
WC1 WC1
123 WC1
456 WC1
789 WC1
abc WC1
fgh WC1
WC2 WC2
123 WC2
abc WC2
123 WC2

I tried many things and read famous DON'T https://stackoverflow.com/a/55557758 but still can't understand what should I use because I want loop to go through every row one by one to assign WC1 only to top values and when loop will find WC2 it will stop assigning WC1 to values


Solution

  • Use Series.where with filter rows with WC value by Series.str.contains and forward non missing values by Series.ffill:

    df['Column2'] = df['Column1'].where(df['Column1'].str.contains('WC')).ffill()
    print (df)
      Column1 Column2
    0     WC1     WC1
    1     123     WC1
    2     456     WC1
    3     789     WC1
    4     abc     WC1
    5     fgh     WC1
    6     WC2     WC2
    7     123     WC2
    8     abc     WC2
    9     123     WC2
    

    How it working:

    print (df['Column1'].str.contains('WC'))
    0     True
    1    False
    2    False
    3    False
    4    False
    5    False
    6     True
    7    False
    8    False
    9    False
    Name: Column1, dtype: bool
    
    print (df['Column1'].where(df['Column1'].str.contains('WC')))
    0    WC1
    1    NaN
    2    NaN
    3    NaN
    4    NaN
    5    NaN
    6    WC2
    7    NaN
    8    NaN
    9    NaN
    Name: Column1, dtype: object
    

    print (df['Column1'].where(df['Column1'].str.contains('WC')).ffill())
    0    WC1
    1    WC1
    2    WC1
    3    WC1
    4    WC1
    5    WC1
    6    WC2
    7    WC2
    8    WC2
    9    WC2
    Name: Column1, dtype: object