Search code examples
pythonpandasconditional-statementscalculated-columns

how to create and fill a new column based on conditions in two other columns?


How can I create a new column and fill it with values based on the condition of two other columns?

input:

    import pandas as pd
    import numpy as np

    list1 = ['no','no','yes','yes','no','no','no','yes','no','yes','yes','no','no','no']
    list2 = ['no','no','no','no','no','yes','yes','no','no','no','no','no','yes','no']

    df = pd.DataFrame({'A' : list1, 'B' : list2}, columns = ['A', 'B'])

    df['C'] = np.where ((df['A'] == 'yes') & (df['A'].shift(1) == 'no'), 'X', np.nan)
    df['D'] = 'nan','nan','X','X','X','X','nan','X','X','X','X','X','X','nan'

    print (df)

output:

          A    B    C    D
    0    no   no  nan  nan
    1    no   no  nan  nan
    2   yes   no    X    X
    3   yes   no  nan    X
    4    no   no  nan    X
    5    no  yes  nan    X
    6    no  yes  nan  nan
    7   yes   no    X    X
    8    no   no  nan    X
    9   yes   no    X    X
    10  yes   no  nan    X
    11   no   no  nan    X
    12   no  yes  nan    X
    13   no   no  nan  nan

Columns A and B will be givens and only contain 'yes' or 'no' values. There can only be three possible pairs ('no'-'no', 'yes'-'no', or 'no'-'yes'). There can never be a 'yes'-'yes' pair.

The goal is to place an 'X' in the new column when a 'yes'-'no' pair is encountered and then to continue filling in 'X's until there is a 'no'-'yes' pair. This could happen over a few rows or several hundred rows.

Column D shows the desired output.

Column C is the current failing attempt.

Can anyone help? Thanks in advance.


Solution

  • Try this:

    df["E"] = np.nan
    
    # Use boolean indexing to set no-yes to placeholder value
    df.loc[(df["A"] == "no") & (df["B"] == "yes"), "E"] = "PL"
    
    # Shift placeholder down by one, as it seems from your example
    # that you want X to be on the no-yes "stopping" row
    df["E"] = df.E.shift(1)
    
    # Then set the X value on the yes-no rows
    df.loc[(df.A == "yes") & (df.B == "no"), "E"] = "X"
    df["E"] = df.E.ffill() # Fill forward
    
    # Fix placeholders
    df.loc[df.E == "PL", "E"] = np.nan
    

    Results:

        A   B   C   D   E
    0   no  no  nan nan NaN
    1   no  no  nan nan NaN
    2   yes no  X   X   X
    3   yes no  nan X   X
    4   no  no  nan X   X
    5   no  yes nan X   X
    6   no  yes nan nan NaN
    7   yes no  X   X   X
    8   no  no  nan X   X
    9   yes no  X   X   X
    10  yes no  nan X   X
    11  no  no  nan X   X
    12  no  yes nan X   X
    13  no  no  nan nan NaN