Search code examples
pythondatetimepandastimedelta

Calculate datetime difference in years, months, etc. in a new pandas dataframe column


I have a pandas dataframe looking like this:

Name    start        end
A       2000-01-10   1970-04-29

I want to add a new column providing the difference between the start and end column in years, months, days.

So the result should look like:

Name    start        end          diff
A       2000-01-10   1970-04-29   29y9m etc.

the diff column may also be a datetime object or a timedelta object, but the key point for me is, that I can easily get the Year and Month out of it.

What I tried until now is:

df['diff'] = df['end'] - df['start']

This results in the new column containing 10848 days. However, I do not know how to convert the days to 29y9m etc.


Solution

  • With a simple function you can reach your goal.

    The function calculates the years difference and the months difference with a simple calculation.

    import pandas as pd
    import datetime
    
    def parse_date(td):
        resYear = float(td.days)/364.0                   # get the number of years including the the numbers after the dot
        resMonth = int((resYear - int(resYear))*364/30)  # get the number of months, by multiply the number after the dot by 364 and divide by 30.
        resYear = int(resYear)
        return str(resYear) + "Y" + str(resMonth) + "m"
    
    df = pd.DataFrame([("2000-01-10", "1970-04-29")], columns=["start", "end"])
    df["delta"] = [parse_date(datetime.datetime.strptime(start, '%Y-%m-%d') - datetime.datetime.strptime(end, '%Y-%m-%d')) for start, end in zip(df["start"], df["end"])]
    print df
    
            start         end  delta
    0  2000-01-10  1970-04-29  29Y9m