I have a dataframe that looks like the following:
I want the values of the last column to be simply integers. That column is the difference of two pandas._libs.tslibs.period.Period
dtype columns and each element in it is of pandas.tseries.offsets.Day
dtype.
How do I do that?
It's unclear why you choose to convert your dates to time periods in the first place - it prevents you from achieving the goal of calculating the time difference (in days) between two dates. The following two lines should, therefore, be removed:
data_dates['ExaminMY']= data_dates['ExaminDate'].dt.to_period('D')
data_dates['DeathMY']= data_dates['DeathDate'].dt.to_period('D')
Explanation: with Period
objects and there's no clear definition of what's the time difference (in days or otherwise) between two time periods (e.g. Q42019 and Q12020). You could be referring to the starting date, the end-date, or some combination of the above. Plus, periods (offsets, really) like '1 month' or '1 quarter` can differ in the number of days they contain.
If what's you're interested in is the time difference, in days, between DeathDate
and ExaminDate
, just do the calculation on the original DateTime fields:
# I don't think you need these three lines, as you're reading the date from a file. It's just
# to make sure the example works.
df = pd.DataFrame({"ExamineDate": ['2020-01-15'], "DeathDate": ["2020-04-20"]})
df.ExamineDate = pd.to_datetime(df.ExamineDate)
df.DeathDate = pd.to_datetime(df.DeathDate)
# This is where the real stuff begins
df["days_diff"] = df.DeathDate - df.ExamineDate
df["days_diff_int"] = df.days_diff.dt.days
print (df)
The result is:
ExamineDate DeathDate days_diff days_diff_int
0 2020-01-15 2020-04-20 96 days 96