Search code examples
pythonpandasconditional-statementsfindrolling-computation

Pandas: Pandas Rolling Condition to find smallest values and return the matched result


I have the Data frame with two set of values A & B in column 'ID'.

Need to find the smallest value from previous rows in the same columns 'MT' & 'Values'.

If 'MT' & 'Value's column current row value is smaller than its previous values the 'Matched Values' column & 'Date Values' column has to be updated with the conditions that matched.

data = {'ID': ['A', 'A', 'A', 'A','A','B', 'B', 'B', 'B','B'],
        'Date': ['02-05-2023', '03-05-2023', '04-05-2023', '05-05-2023','06-05-2023','02-05-2023', '03-05-2023', '04-05-2023', '05-05-2023','06-05-2023'],
      'MT': [10, 20, 15, 7, 25, 10, 20, 15, 7, 25],
        'Values': [100.5, 200, 150.35, 285,250,100.5, 200, 150.35, 285,250]}

df = pd.DataFrame(data)

print(df)

Available Data frame:
*******************

  ID        Date  MT  Values
0  A  02-05-2023  10  100.50
1  A  03-05-2023  20  200.00
2  A  04-05-2023  15  150.35
3  A  06-05-2023  25  250.00
4  B  02-05-2023  10  100.50
5  B  03-05-2023  20  200.00
6  B  04-05-2023  15  150.35
7  B  06-05-2023  25  250.00
  
Output Required:
*******************
    ID  Date        Mt  Values      Matched Values      Matched Dates
0   A   02-May-23   10  100.5       
1   A   03-May-23   20  200           100.5             02-May-23
2   A   04-May-23   15  150.35        100.5             02-May-23
3   A   06-May-23   25  250       100.5,200,150.35,    02-05-2023,03-05-
                                                      2023,04-05-2023                          
4   B   02-May-23   10  100.5       
5   B   03-May-23   20  200           100.5              02-May-23
6   B   04-May-23   15  150.35        100.5              02-May-23
7   B   06-May-23   25  250     100.5,200,150.35     02-05-2023,03-05-
                                                     2023,04-05-2023


Solution

  • This is a straightforward solution with df.iterrows():

    for i, row in df.iterrows():
        dfa = df.iloc[:i, :]  # slice with rows above current
        dfa = dfa[(dfa.ID==row.ID) & (dfa.MT < row.MT) & (dfa.Values < row.Values)] # matched rows
        df.loc[i, 'matched_values'] = ','.join(map(str, dfa.Values))
        df.loc[i, 'matched_dates'] = ','.join(map(str, dfa.Date))
    

    There may be a more elegant solution using df.expanding().apply(). Not sure if it will be much faster though.