I have a pandas dataframe looking like this:
Name start end
A 2000-01-10 1970-04-29
I want to add a new column providing the difference between the start
and end
column in years, months, days.
So the result should look like:
Name start end diff
A 2000-01-10 1970-04-29 29y9m etc.
the diff column may also be a datetime
object or a timedelta
object, but the key point for me is, that I can easily get the Year and Month out of it.
What I tried until now is:
df['diff'] = df['end'] - df['start']
This results in the new column containing 10848 days
. However, I do not know how to convert the days to 29y9m etc.
With a simple function you can reach your goal.
The function calculates the years difference and the months difference with a simple calculation.
import pandas as pd
import datetime
def parse_date(td):
resYear = float(td.days)/364.0 # get the number of years including the the numbers after the dot
resMonth = int((resYear - int(resYear))*364/30) # get the number of months, by multiply the number after the dot by 364 and divide by 30.
resYear = int(resYear)
return str(resYear) + "Y" + str(resMonth) + "m"
df = pd.DataFrame([("2000-01-10", "1970-04-29")], columns=["start", "end"])
df["delta"] = [parse_date(datetime.datetime.strptime(start, '%Y-%m-%d') - datetime.datetime.strptime(end, '%Y-%m-%d')) for start, end in zip(df["start"], df["end"])]
print df
start end delta
0 2000-01-10 1970-04-29 29Y9m