Below is a Data frame with two set of values A & B in column 'ID'. Need to find the smallest value from previous rows in the same columns 'MT' & 'Price', if there is a greater price value than the current price value, the loop should break and perform this condition for the following rows. The matched condition values are required to be populated in different columns, as sample output is shown below,
Python
df = {'ID': ['A', 'A', 'A', 'A','A','A','B', 'B', 'B', 'B','B','B'],
'Date': ['01-05-2023','02-05-2023', '03-05-2023', '04-05-2023', '06-05-2023','07-05-2023','01-05-2023','02-05-2023', '03-05-2023', '04-05-2023', '06-05-2023','07-05-2023'],
'MT': [2, 5, 10, 15, 20, 30,2, 5, 10, 15, 20, 30],
'Price': [300, 100.5, 200, 150.35, 250, 90,300, 100.5, 200, 150.35, 250, 90,]}
df = pd.DataFrame(data)
print(df)
Available Data frame:
*******************
ID Date MT Price
0 A 01-05-2023 2 300.00
1 A 02-05-2023 5 100.50
2 A 03-05-2023 10 200.00
3 A 04-05-2023 15 150.35
4 A 06-05-2023 20 250.00
5 A 07-05-2023 30 90.00
6 B 01-05-2023 2 300.00
7 B 02-05-2023 5 100.50
8 B 03-05-2023 10 200.00
9 B 04-05-2023 15 150.35
10 B 06-05-2023 20 250.00
11 B 07-05-2023 30 90.00
Output Required:
*******************
ID Date MT Price Matched_Price Date_Values
0 A 01-05-2023 2 300
1 A 02-05-2023 5 100.5
2 A 03-05-2023 10 200 100.5 02-05-2023
3 A 04-05-2023 15 150.35
4 A 06-05-2023 20 250 100.5,200,150.35 02-05-2023,03-05-2023,04-05-2023
5 A 07-05-2023 30 90
6 B 01-05-2023 2 300
7 B 02-05-2023 5 100.5
8 B 03-05-2023 10 200 100.5 02-05-2023
9 B 04-05-2023 15 150.35
10 B 06-05-2023 20 250 100.5,200,150.35 02-05-2023,03-
05-2023,04-05-2023
11 B 07-05-2023 30 90
The below provided code works fine in identifying all the smaller values in the column, but the second part of the condition i.e) if there is a greater price value than the current price value, the loop should break and perform this condition for the following rows, needs to be applied.
for i, row in df.iterrows(): dfa = df.iloc[:i, :] # slice with rows above current dfa = dfa[(dfa.ID==row.ID) & (dfa.MT < row.MT) & (dfa.Price< row.Price)] # matched rows df.loc[i, 'matched_Price'] = ','.join(map(str, dfa.Price)) df.loc[i, 'matched_dates'] = ','.join(map(str, dfa.Date))
I usually prefer to do these things within the .apply()
function rather than in a loop. It's not wrong to use a loop, but I find it can make things harder to deal with in pandas world. It is also usually slower than using apply or other better vectorized solutions.
Here is what I came up with:
import pandas as pd
df = {'ID': ['A', 'A', 'A', 'A','A','A','B', 'B', 'B', 'B','B','B'],
'Date': ['01-05-2023','02-05-2023', '03-05-2023', '04-05-2023', '06-05-2023','07-05-2023','01-05-2023','02-05-2023', '03-05-2023', '04-05-2023', '06-05-2023','07-05-2023'],
'MT': [2, 5, 10, 15, 20, 30,2, 5, 10, 15, 20, 30],
'Price': [300, 100.5, 200, 150.35, 250, 90,300, 100.5, 200, 150.35, 250, 90,]}
df = pd.DataFrame(df)
# Store in a copy so we can manipulate it and maintain the source
mod_df = df.copy()
# Shift gives us a series "pushed" along one entry. Allows comparison with the previous entry
mod_df["shift"] = df["Price"].shift()
# diff is where we will evaluate the dataframe. Where this diff is positive within
# a product ID is the rows you want to focus on.
mod_df["diff"] = mod_df["Price"] - mod_df["shift"]
# Function for doing the heavy lifting
def pull_values_w_price_less_than(row: pd.Series, column: str):
# The case that the price decreased or stayed the same
if row["diff"] <= 0:
return pd.NA
# Price increased (where we want to eval)
else:
# Entries relevant to this row
relevant_df = mod_df[
# Product ID is the same
(mod_df["ID"] == row["ID"])
# Price is less than this one
& (mod_df["Price"] < row["Price"])
# And it happened before this one
& (mod_df["Date"] < row["Date"])
]
# The matched values (be it Price or Date) determined by the passed arg
# column placed in a list
matched_values = relevant_df[column].to_list()
# If nothing was matched (this happens at ID boundaries where the price
# went down so we are in this region, but we don't want to return an
# empty list)
if len(matched_values) == 0:
return pd.NA
# Return the values to put in the row
return matched_values
# Do this for the prices
mod_df["Matched_Price"] = mod_df.apply(
pull_values_w_price_less_than, args=("Price",), axis=1
)
# And the dates
mod_df["Date_Values"] = mod_df.apply(
pull_values_w_price_less_than, args=("Date",), axis=1
)
# Remove our helper columns
output = mod_df.drop(columns=["shift", "diff"])
# View the df (optional ofc)
output
I also put the values into a list rather than a string separated by commas, but you can easily convert those columns using this function:
def list_to_comma_string(value_list) -> str:
# If it was already null we cant change that
if value_list is pd.NA:
return pd.NA
# list comprehension to change all the values to strings in the list
joiner = [str(x) for x in value_list]
# The most efficient way to combine strings in python
return ','.join(joiner)
# Convert both columns
output['Matched_Price'] = output['Matched_Price'].apply(list_to_comma_string)
output['Date_Values'] = output['Date_Values'].apply(list_to_comma_string)
# Again view the df
output
Hope this helps!