Search code examples
pythonpandasdatetimetype-conversion

How to convert pandas.tseries.offsets.Day datatype to integer dtype


I have a dataframe that looks like the following:

data

I want the values of the last column to be simply integers. That column is the difference of two pandas._libs.tslibs.period.Period dtype columns and each element in it is of pandas.tseries.offsets.Day dtype.

How do I do that?


Solution

  • It's unclear why you choose to convert your dates to time periods in the first place - it prevents you from achieving the goal of calculating the time difference (in days) between two dates. The following two lines should, therefore, be removed:

    data_dates['ExaminMY']= data_dates['ExaminDate'].dt.to_period('D')
    data_dates['DeathMY']= data_dates['DeathDate'].dt.to_period('D')
    

    Explanation: with Period objects and there's no clear definition of what's the time difference (in days or otherwise) between two time periods (e.g. Q42019 and Q12020). You could be referring to the starting date, the end-date, or some combination of the above. Plus, periods (offsets, really) like '1 month' or '1 quarter` can differ in the number of days they contain.

    If what's you're interested in is the time difference, in days, between DeathDate and ExaminDate, just do the calculation on the original DateTime fields:

    # I don't think you need these three lines, as you're reading the date from a file. It's just 
    # to make sure the example works. 
    df = pd.DataFrame({"ExamineDate": ['2020-01-15'], "DeathDate": ["2020-04-20"]})
    df.ExamineDate = pd.to_datetime(df.ExamineDate)
    df.DeathDate = pd.to_datetime(df.DeathDate)
    
    # This is where the real stuff begins 
    df["days_diff"] = df.DeathDate - df.ExamineDate
    df["days_diff_int"] = df.days_diff.dt.days
    print (df)
    

    The result is:

      ExamineDate  DeathDate days_diff  days_diff_int
    0  2020-01-15 2020-04-20   96 days             96