Search code examples
pythonpandasdataframenumpydate-difference

How to do conditional calculations in Python


I have a sample data that looks like this.

Column DateDuration was calculated in Excel, following below logic:

  • DateDuration between SecondDate and FirstDate >= 28, then DateDuration = SecondDate - FirstDate.
  • DateDuration between SecondDate and FirstDate <28, if ThirdDate = nan, then DateDuration = SecondDate - FirstDate.
  • DateDuration between SecondDate and FirstDate <28, if ThirdDate = not nan, then consider (ThirdDate - FirstDate):
    • ThirdDate - FirstDate >= 28, then DateDuration = ThirdDate - FirstDate.
    • ThirdDate - FirstDate < 28, if FourthDate = nan, then DateDuration = ThirdDate - FirstDate.
    • ThirdDate - FirstDate < 28, if FourthDate = not nan, then DateDuration = FourthDate - FirstDate.

I would like to calculate DateDuration in Python but do not know how to do about this.

Types of data in Python:

  • ID int64
  • FirstDate object
  • SecondDate object
  • ThirdDate object
  • FourthDate object

I am new to Python. Any help would be greatly appreciated!!

import pandas as pd
import numpy as np
df['FirstDate'] = pd.to_datetime(df['FirstDate'])
df['SecondDate'] = pd.to_datetime(df['SecondDate'])
df['DayDifference2'] = (df['SecondDate']) -(df['FirstDate'])
df['DayDifference3'] = (df['ThirdDate']) -(df['FirstDate'])
df['DayDifference4'] = (df['FourthDate']) -(df['FirstDate'])
    
x = df['DayDifference2'].dt.days
y = df['DayDifference3'].dt.days
z = df['DayDifference4'].dt.days

condlist = [x<28, x>=28]
choicelist = [(df['ThirdDate']) -(df['FirstDate']), (df['SecondDate']) -(df['FirstDate'])]
np.select(condlist, choicelist)

My data:

ID FirstDate SecondDate ThirdDate FourthDate DateDuration
2914300 2021-09-23 2021-10-07 2021-11-29 2021-12-20 67
3893461 2021-09-08 2021-10-06 2022-04-07 211
4343075 2021-06-23 2021-09-27 96
4347772 2021-06-23 2021-09-27 96
4551963 2021-08-02 2021-10-14 2022-03-11 73
4893324 2021-09-30 2021-10-01 2022-03-03 2022-03-10 154
5239991 2021-06-24 2021-08-26 2021-09-25 2022-02-03 63
8454947 2021-09-28 2021-10-05 7
8581390 2021-09-27 2022-03-21 2022-03-25 175
8763766 2021-09-20 2021-10-04 2021-12-09 80
9144185 2021-06-18 2021-06-23 5
9967685 2021-09-13 2021-10-29 2022-02-07 2022-03-23 46
11367560 2021-08-31 2021-09-28 2021-10-21 2022-02-11 51

Solution

  • import pandas as pd
    import numpy as np
    
    df = pd.read_csv('date_example.csv')
    df.loc[:,'FirstDate':'FourthDate'] = df.loc[:,'FirstDate':'FourthDate'].astype('datetime64[ns]')
    df
    

    enter image description here NaT is a missing value of datetime64[ns] type




    Conditions & Choices

    conditions = [
        (df['SecondDate'] - df['FirstDate']).dt.days >= 28,
        ((df['SecondDate'] - df['FirstDate']).dt.days < 28) & df['ThirdDate'].isna(),
        ((df['SecondDate'] - df['FirstDate']).dt.days < 28) & df['ThirdDate'].notna() & ((df['ThirdDate'] - df['FirstDate']).dt.days >= 28),
        ((df['SecondDate'] - df['FirstDate']).dt.days < 28) & df['ThirdDate'].notna() & ((df['ThirdDate'] - df['FirstDate']).dt.days < 28) & df['FourthDate'].isna(),
        ((df['SecondDate'] - df['FirstDate']).dt.days < 28) & df['ThirdDate'].notna() & ((df['ThirdDate'] - df['FirstDate']).dt.days < 28) & df['FourthDate'].notna()
    ]
    
    choices = [
        (df['SecondDate'] - df['FirstDate']).dt.days,
        (df['SecondDate'] - df['FirstDate']).dt.days,
        (df['ThirdDate'] - df['FirstDate']).dt.days,
        (df['ThirdDate'] - df['FirstDate']).dt.days,
        (df['FourthDate'] - df['FirstDate']).dt.days
    ]
    



    df['Duration'] = np.select(conditions, choices)
    df
    

    Result enter image description here



    Discussion: There are some differences, e.g., second row, ID = 3893461, according to your conditions(DateDuration between SecondDate and FirstDate >= 28, then DateDuration = SecondDate - FirstDate.), SecondDate - FirstDate of ID = 3893461 is 28, same thing happened on last row, ID = 11367560