Search code examples
pythonexcelpandasdatedelta

Delta between two dates in Excel return negative number


I have a script to return the days difference between today's date and a date in an Excel file.

For some reason, for dates that are single numbered, I get a negative number.

For example:

Date : num days diffrence from today

4/7/2019 : -72 (wrong)

5/7/2019 : -42 (wrong)

20/8/2019 : 63 (correct)

30/6/2019 : 12 (correct)

The results are true to the day posting this question (17/6/2019)

I checked the rest of the 100 dates in my Excel files, and this behavior only happens on single numbered dates. For example: 5/7/2019 (July 5th 2019) or 3/10/2019 (October 3rd 2019).

This is my code:

import pandas as pd
import datetime as dt

file_name = pd.read_excel (r'Changes log.xlsx')
df = pd.DataFrame(file_name, columns= ['Due Date'])

today = pd.Timestamp.today()
df['Due Date'] = pd.to_datetime(df['Due Date'])
delta = (df['Due Date'] - today).dt.days
print(delta)

Note: df['Due Date] contains the dates in the Excel file. Which are formatted by %d/%m/%Y

Any help would be great


Solution

  • Use the argument dayfirst=True:

    df = pd.read_excel('Changes log.xlsx')
    df.columns = ['Due Date']
    
    today = pd.Timestamp.today()
    df['Due Date'] = pd.to_datetime(df['Due Date'], dayfirst=True)
    delta = (df['Due Date'] - today).dt.days
    print(delta)
    

    Example with data:

    df = pd.DataFrame({'Date': ['4/7/2019', '5/7/2019', '20/8/2019', '30/6/2019']})
    df['Date'] = pd.to_datetime(df['Date'], dayfirst=True)
    
    delta = (df['Date'] - pd.Timestamp.today()).dt.days
    

    Output

    0    16
    1    17
    2    63
    3    12
    Name: Date, dtype: int64