How can I create a new column and fill it with values based on the condition of two other columns?
input:
import pandas as pd
import numpy as np
list1 = ['no','no','yes','yes','no','no','no','yes','no','yes','yes','no','no','no']
list2 = ['no','no','no','no','no','yes','yes','no','no','no','no','no','yes','no']
df = pd.DataFrame({'A' : list1, 'B' : list2}, columns = ['A', 'B'])
df['C'] = np.where ((df['A'] == 'yes') & (df['A'].shift(1) == 'no'), 'X', np.nan)
df['D'] = 'nan','nan','X','X','X','X','nan','X','X','X','X','X','X','nan'
print (df)
output:
A B C D
0 no no nan nan
1 no no nan nan
2 yes no X X
3 yes no nan X
4 no no nan X
5 no yes nan X
6 no yes nan nan
7 yes no X X
8 no no nan X
9 yes no X X
10 yes no nan X
11 no no nan X
12 no yes nan X
13 no no nan nan
Columns A and B will be givens and only contain 'yes' or 'no' values. There can only be three possible pairs ('no'-'no', 'yes'-'no', or 'no'-'yes'). There can never be a 'yes'-'yes' pair.
The goal is to place an 'X' in the new column when a 'yes'-'no' pair is encountered and then to continue filling in 'X's until there is a 'no'-'yes' pair. This could happen over a few rows or several hundred rows.
Column D shows the desired output.
Column C is the current failing attempt.
Can anyone help? Thanks in advance.
Try this:
df["E"] = np.nan
# Use boolean indexing to set no-yes to placeholder value
df.loc[(df["A"] == "no") & (df["B"] == "yes"), "E"] = "PL"
# Shift placeholder down by one, as it seems from your example
# that you want X to be on the no-yes "stopping" row
df["E"] = df.E.shift(1)
# Then set the X value on the yes-no rows
df.loc[(df.A == "yes") & (df.B == "no"), "E"] = "X"
df["E"] = df.E.ffill() # Fill forward
# Fix placeholders
df.loc[df.E == "PL", "E"] = np.nan
Results:
A B C D E
0 no no nan nan NaN
1 no no nan nan NaN
2 yes no X X X
3 yes no nan X X
4 no no nan X X
5 no yes nan X X
6 no yes nan nan NaN
7 yes no X X X
8 no no nan X X
9 yes no X X X
10 yes no nan X X
11 no no nan X X
12 no yes nan X X
13 no no nan nan NaN