Search code examples
pythonpandasdate-formattingpython-datetime

How to resolve conflicting of pandas.to_datetime with python's built-in datetime?


Initially I've got

print(transactions.Date.loc[0])
>>>30/04/18

First I've done

transactions['Date'] = pd.to_datetime(transactions['Date'])

Now

print(transactions.Date.loc[0])
>>>2018-04-30 00:00:00

After that I check for an equality condition like so

transactions.Date.loc[0] + datetime.timedelta(days=1) == transactions.Date.loc[1]
>>>False

So I examine

print(transactions.Date.loc[0] + datetime.timedelta(days=1))
>>>2018-05-01 00:00:00 
print(transactions.Date.loc[1])
>>>2018-01-05 00:00:00

Turns out both refer to same time, but with a different format. How do I modify my code to get True on equality?

EDIT: As pointed in the comments, the problem is actually May 1 being transformed to January 5 when I use pd.to_datetime. So the new question is how do I resolve this?


Solution

  • The code worked fine, are you sure that the difference between transactions.Date.loc[0] andtransactions.Date.loc[1] is exactly 1 day?

    >>> l = pd.to_datetime(['2000/23/11', '2000/24/11'], dayfirst=True)
    >>> l[0] + datetime.timedelta(days=1) == l[1]
    True
    >>> print(l[0], l[0] + datetime.timedelta(days=1), l[1], sep='\t')
    2000-11-23 00:00:00     2000-11-24 00:00:00     2000-11-24 00:00:00
    

    Edit: using your data i've found the error:

    with dayfirst=True works well

    >>> l = pd.to_datetime(['30/4/20', '1/5/20'], dayfirst=True)
    >>> l[0] - l[1]
    Timedelta('-1 days +00:00:00')
    

    without it the conversion format is different between the 2 dates

    >>> l = pd.to_datetime(['30/4/20', '1/5/20'])
    >>> l[0] - l[1]
    Timedelta('116 days 00:00:00')