I have the following dataset which I'm looking at banking approval data by two different date months (EOM Date = is when it was uploaded & Approve Date = When facility was actually approved).
df = pd.DataFrame(data=[["1/08/23","1/08/23","yy1"],
["1/08/23","1/08/23","yy2"],
["1/08/23","1/07/23","yy3"],
["1/07/23","1/07/23","yy4"],
["1/07/23", "1/06/23", "yy5"],
["1/07/23", "1/07/23", "yy6"],
["1/06/23", "1/06/23", "yy7"],
["1/06/23", "1/05/23", "yy8"],
["1/06/23", "1/04/23", "yy9"]],
columns= ["EOM Date","Approve Date","Facility Id"])
The desired output is if the 'EOM Date' and 'Approve Date' match id like that in the output. Currently, I am able to create this by using the numpy np.where(EOM Date == Approve Date, Facility Id, 0)
etc. However, I need to do this in a loop format.
Output
df = pd.DataFrame(data=[["1/08/23","1/08/23","yy1"],
["1/08/23","1/08/23","yy2"],
["1/07/23","1/07/23","yy4"],
["1/07/23", "1/07/23", "yy6"],
["1/06/23", "1/06/23", "yy7"]],
columns= ["EOM Date","Approve Date","Facility Id"])
EOM Date | Approve Date | Facility Id |
---|---|---|
1/08/23 | 1/08/23 | yy1 |
1/08/23 | 1/08/23 | yy2 |
1/07/23 | 1/07/23 | yy4 |
1/07/23 | 1/07/23 | yy6 |
1/06/23 | 1/06/23 | yy7 |
You can use iterrows()
to loop over the dataframe (list comprehensions format)
lst = [val for i, val in df.iterrows() if val['EOM Date'] == val['Approve Date']]
df = pd.DataFrame(lst)
But the efficient way is to use np.where
or broadcasting
df = df[df['EOM Date'] == df['Approve Date']]
Both will produce
EOM Date Approve Date Facility Id
0 1/08/23 1/08/23 yy1
1 1/08/23 1/08/23 yy2
3 1/07/23 1/07/23 yy4
5 1/07/23 1/07/23 yy6
6 1/06/23 1/06/23 yy7