Search code examples
pythondataframemultiple-columnsassign

Assign new column in DataFrame based on if value is in a certain value range


I have two DataFrames as follows:

df_discount = pd.DataFrame(data={'Graduation' : np.arange(0,1000,100), 'Discount %' : np.arange(0,50,5)})
df_values = pd.DataFrame(data={'Sum' : [20,801,972,1061,1251]})

enter image description here enter image description here

Now my goal is to get a new column df_values['New Sum'] for my df_values that applies the corresponding discount to df_values['Sum'] based on the value of df_discount['Graduation']. If the Sum is >= the Graduation the corresponding discount is applied.

Examples: Sum 801 should get a discount of 40% resulting in 480.6, Sum 1061 gets 45% resulting in 583.55.

I know I could write a funtion with if else conditions and the returning values. However, is there a better way to do this if you have very many different conditions?


Solution

  • You could try if pd.merge_asof() works for you:

    df_discount = pd.DataFrame({
        'Graduation': np.arange(0, 1000, 100), 'Discount %': np.arange(0, 50, 5)
    })
    df_values = pd.DataFrame({'Sum': [20, 100, 101, 350, 801, 972, 1061, 1251]})
    
    df_values = (
        pd.merge_asof(
            df_values, df_discount,
            left_on="Sum", right_on="Graduation",
            direction="backward"
        )
        .assign(New_Sum=lambda df: df["Sum"] * (1 - df["Discount %"] / 100))
        .drop(columns=["Graduation", "Discount %"])
    )
    

    Result (without the last .drop(columns=...) to see what's happening):

        Sum  Graduation  Discount %  New_Sum
    0    20           0           0    20.00
    1   100         100           5    95.00
    2   101         100           5    95.95
    3   350         300          15   297.50
    4   801         800          40   480.60
    5   972         900          45   534.60
    6  1061         900          45   583.55
    7  1251         900          45   688.05