Search code examples
pythonpandasdatedatetime

Convert TZ to date and get the date difference in python


I have a dataset where the dates are all in yyyy-mm-ddT00:00:00.000Z format. I need to calculate the difference between the two date fields.

I have tried using dt.days (below) to perform the difference equation but I got a type Error.

TypeError: unsupported operand type(s) for -: 'str' and 'str'

I have also tried to use dt.datetime.strptime(df['columnname'], "%y%m%d").date(). Any other ideas of what I can do?

# import packages
# python
import pandas as pd
import numpy as np
import datetime as dt
from datetime import timedelta
from datetime import datetime
import os
import tempfile
import pymssql
from dateutil.relativedelta import relativedelta
import tempfile

# initialize data of lists.
data = {'ID': ['089', '983', '037', '654'],
        'Schedule_Date': ['2024-02-06T00:00:00.000Z', '2024-03-17T00:00:00.000Z', '2024-02-02T00:00:00.000Z', '2024-08-14T00:00:00.000Z'],
        'Out_Date': ['2024-02-08T00:00:00.000Z', '2024-04-27T00:00:00.000Z', '2024-05-24T00:00:00.000Z', '2024-02-26T00:00:00.000Z']}
 
# Creates pandas DataFrame.
df = pd.DataFrame(data, index=['rank1',
                               'rank2',
                               'rank3',
                               'rank4'])

df['days_alert']= (df['Out_Date'] - df['Schedule_Date']).dt.days

Solution

  • The to_datetime function is your friend here. Your date columns are just strings. To do date arithmetic, they have to be converted to date/time types.

    import pandas as pd
    
    # initialize data of lists.
    data = {'ID': ['089', '983', '037', '654'],
            'Schedule_Date': ['2024-02-06T00:00:00.000Z', '2024-03-17T00:00:00.000Z', '2024-02-02T00:00:00.000Z', '2024-08-14T00:00:00.000Z'],
            'Out_Date': ['2024-02-08T00:00:00.000Z', '2024-04-27T00:00:00.000Z', '2024-05-24T00:00:00.000Z', '2024-02-26T00:00:00.000Z']}
     
    # Creates pandas DataFrame.
    df = pd.DataFrame(data, index=['rank1',
                                   'rank2',
                                   'rank3',
                                   'rank4'])
    
    df['Schedule_Date'] = pd.to_datetime(df['Schedule_Date'])
    df['Out_Date'] = pd.to_datetime(df['Out_Date'])
    df['days_alert']= (df['Out_Date'] - df['Schedule_Date']).dt.days
    print(df)
    

    Output:

            ID             Schedule_Date                  Out_Date  days_alert
    rank1  089 2024-02-06 00:00:00+00:00 2024-02-08 00:00:00+00:00           2
    rank2  983 2024-03-17 00:00:00+00:00 2024-04-27 00:00:00+00:00          41
    rank3  037 2024-02-02 00:00:00+00:00 2024-05-24 00:00:00+00:00         112
    rank4  654 2024-08-14 00:00:00+00:00 2024-02-26 00:00:00+00:00        -170