Search code examples
pythonpandasdataframe

How to calculate the number of days between two rows of the same column?


I have this data:

enter image description here

I need to create a new column in python that returns the number of days between the End Date of the task Technical Analysis and the End Date of the task Approval for the different Project Id's.

I've tried this approach:

approval_date = df.loc[df['Task Name'] == 'Approval', 'End Date'].values[0]

technical_analysis_date = df.loc[df['Task Name'] == 'Technical Analysis', 'End Date'].values[0]

nr_days_negotiation = (pd.to_datetime(approval_date) - pd.to_datetime(technical_analysis_date)).days

df['Nr_days_negotiation'] = nr_days_negotiation

But it's returning the same value of number of days for all the different project id's...

What I want to achieve is this:

enter image description here

How can I do this?

Also, what about if I had more tasks in the column Task Name, and I needed to create the column Nr_days_Negotiation that calculates the number of days between Technical Analysis and Approval tasks only for each Project Id?

(This is the example data in this case): Example data


Solution

  • Code

    g = pd.to_datetime(df['End Date'], format='%d/%m/%Y').groupby(df['Project Id'])
    df['Nr_days_negotiation'] = g.transform('max').sub(g.transform('min')).dt.days
    

    df

      Project Id           Task Name    End Date  Nr_days_negotiation
    0   Q6730168            Approval  13/11/2023                   10
    1   Q6730168  Technical Analysis  03/11/2023                   10
    2   Q6737180  Technical Analysis  05/02/2024                   22
    3   Q6737180            Approval  27/02/2024                   22
    4   Q6747678            Approval  30/11/2023                   14
    5   Q6747678  Technical Analysis  16/11/2023                   14
    6   Q6732796  Technical Analysis  20/11/2023                   21
    7   Q6732796            Approval  11/12/2023                   21
    

    Example Code

    import pandas as pd
    data = {
        "Project Id": ["Q6730168", "Q6730168", "Q6737180", "Q6737180", "Q6747678", "Q6747678", "Q6732796", "Q6732796"],
        "Task Name": ["Approval", "Technical Analysis", "Technical Analysis", "Approval", "Approval", "Technical Analysis", "Technical Analysis", "Approval"],
        "End Date": ["13/11/2023", "03/11/2023", "05/02/2024", "27/02/2024", "30/11/2023", "16/11/2023", "20/11/2023", "11/12/2023"]
    }
    
    df = pd.DataFrame(data)