Search code examples
pythonpandaspython-2.7python-datetime

How to convert datetime to timestamp and calculate difference between dates using lambda function


I need to convert a variable i created into a timestamp from a datetime.

I need it in a timestamp format to perform a lambda function against my pandas series, which is stored as a datetime64. The lambda function should find the difference in months between startDate and the entire pandas series. Please help?

I've tried using relativedelta to calculate the difference in months but I'm not sure how to implement it with a pandas series.

from datetime import datetime 
import pandas as pd
from dateutil.relativedelta  import relativedelta as rd

#open the data set and store in the series ('df')
file = pd.read_csv("test_data.csv")
df = pd.DataFrame(file)

#extract column "AccountOpenedDate into a data frame"
open_date_data = pd.Series.to_datetime(df['AccountOpenedDate'], format = '%Y/%m/%d')

#set the variable startDate
dateformat = '%Y/%m/%d %H:%M:%S'
set_date  = datetime.strptime('2017/07/01 00:00:00',dateformat)
startDate = datetime.timestamp(set_date)

#This function calculates the difference in months between two dates: ignore 

def month_delta(start_date, end_date):
    delta = rd(end_date, start_date)
    # >>> relativedelta(years=+2, months=+3, days=+28)
    return 12 * delta.years + delta.months

d1 = datetime(2017, 7, 1)
d2 = datetime(2019, 10, 29)
total_months = month_delta(d1, d2)

# Apply a lambda function to each row by adding 5 to each value in each column
dfobj = open_date_data.apply(lambda x: x + startDate)
print(dfobj) 

I'm only using a single column from the loaded data set. It's a date column in the following format ("%Y/%m/%d %H:%M:%S"). I want to find the difference in months between startDate and all the dates in the series.


Solution

  • As I don't have your original csv, I've made up some sample data and hopefully managed to shorten your code quite a bit:

    open_date_data = pd.Series(pd.date_range('2017/07/01', periods=10, freq='M')) 
    startDate = pd.Timestamp("2017/07/01")
    

    Then, with help from this answer to get the appropriate month_diff formula:

    def month_diff(a, b):
        return 12 * (a.year - b.year) + (a.month - b.month)
    
    open_date_data.apply(lambda x: month_diff(x, startDate))