Search code examples
pythonpandasdatedatetimedate-difference

Python: monthly difference between two dates


I have two date fields that I want to get the monthly difference on a 30-year loan term. The dates are actually in yyyy-mm-dd format in my dataset, however, when I check the dtype in Python it says both are objects. The sample data is below:

|End Date  |StartDate|
|----------|---------|
|2/1/2018  |9/30/2016|
|1/1/2024  |9/30/2016|
|10/1/2023 |9/30/2016|
|4/1/2021  |9/30/2016|
|8/1/2030  |9/30/2016|
|10/1/2018 |9/30/2016|
|11/1/2016 |9/30/2016|
|7/1/2018  |9/30/2016|
|8/1/2020  |9/30/2016|
|2/1/2018  |9/30/2016|
|4/1/2018  |9/30/2016|

My code below works:

diff = (pd.to_datetime(df['End Date'], format = '%Y-%m-%d') -  pd.to_datetime(df['Start Date'], format = '%Y-%m-%d'))/30

Huge BUT--the output looks like this:

16 days 07:12:00
88 days 07:12:00
85 days 05:36:00
54 days 19:12:00
168 days 10:24:00
24 days 08:48:00
1 days 01:36:00
21 days 07:12:00
46 days 16:48:00
16 days 07:12:00
18 days 06:24:00

How do I remove 'days' and everything after so only the value shows? How do I convert the value to an integer?


Solution

  • You just need diff.dt.days

    df = pd.DataFrame({
        'End Date': ['2/1/2018', '1/1/2024'], 
        'Start Date': ['9/30/2016', '9/30/2016'],
    })
    
    #the format = '%Y-%m-%d' doesn't match your example data (just letting pandas figure it out)
    df['End Date'] = pd.to_datetime(df['End Date'])
    df['Start Date'] = pd.to_datetime(df['Start Date'])
    
    diff = (df['End Date']-df['Start Date'])/30
    diff.dt.days