Search code examples
pythonpandasdataframecalculated-columns

calculate new column values based on conditions in pandas


I have columns in the pandas dataframe df_profit:

    profit_date   profit
0   01.04         70    
1   02.04         80    
2   03.04         80    
3   04.04         100   
4   05.04         120   
5   06.04         120   
6   07.04         120   
7   08.04         130   
8   09.04         140   
9   10.04         140   

And I have the second dataframe df_deals:

    deals_date
0   03.04
1   05.04
2   06.04

I want to create a new column 'delta' in the df_profit and let it be equal to delta between current value and previous value in 'profit' column. But I want the delta to be calculated only after the first date in the 'profit_date' is equal to the date in the column 'deal_date' of df_deals dataframe and previous value in the delta calculation to be always the same and equal to the value when the first date in 'profit_date' was equal to the first date in 'deals_date'.

So, the result would look like:

    profit_date   profit   delta
0   01.04         70       
1   02.04         80       
2   03.04         80       
3   04.04         100      20
4   05.04         120      40
5   06.04         120      40
6   07.04         120      40
7   08.04         130      50
8   09.04         140      60
9   10.04         140      60

Solution

  • For the next time you should provide better data to make it easier to help (dataframe creation so that we can copy paste your code).

    I think this codes does what you want:

    import pandas as pd
    
    df_profit = pd.DataFrame(columns=["profit_date", "profit"],
                             data=[
                                 ["01.04", 70],
                                 ["02.04", 80],
                                 ["03.04", 80],
                                 ["04.04", 100],
                                 ["05.04", 120],
                                 ["06.04", 120],
                                 ["07.04", 120],
                                 ["08.04", 130],
                                 ["09.04", 140],
                                 ["10.04", 140]])
    
    df_deals = pd.DataFrame(columns=["deals_date"], data=["03.04", "05.04", "06.04"])
    
    # combine both dataframes, based on date columns
    df = df_profit.merge(right=df_deals, left_on="profit_date", right_on="deals_date", how="left")
    
    # find the first value (first row with deals date) and set it to 'base'
    df["base"] = df.loc[df["deals_date"].first_valid_index()]["profit"]
    
    # calculate delta
    df["delta"] = df["profit"] - df["base"]
    
    # Remove unused values
    df.loc[:df["deals_date"].first_valid_index(), "delta"] = None
    
    # remove temporary cols
    df.drop(columns=["base", "deals_date"], inplace=True)
    
    print(df)
    
    

    output is:

      profit_date  profit  delta
    0       01.04      70    NaN
    1       02.04      80    NaN
    2       03.04      80    NaN
    3       04.04     100   20.0
    4       05.04     120   40.0
    5       06.04     120   40.0
    6       07.04     120   40.0
    7       08.04     130   50.0
    8       09.04     140   60.0
    9       10.04     140   60.0