I need to convert a variable i created into a timestamp from a datetime.
I need it in a timestamp format to perform a lambda function against my pandas series, which is stored as a datetime64. The lambda function should find the difference in months between startDate and the entire pandas series. Please help?
I've tried using relativedelta to calculate the difference in months but I'm not sure how to implement it with a pandas series.
from datetime import datetime
import pandas as pd
from dateutil.relativedelta import relativedelta as rd
#open the data set and store in the series ('df')
file = pd.read_csv("test_data.csv")
df = pd.DataFrame(file)
#extract column "AccountOpenedDate into a data frame"
open_date_data = pd.Series.to_datetime(df['AccountOpenedDate'], format = '%Y/%m/%d')
#set the variable startDate
dateformat = '%Y/%m/%d %H:%M:%S'
set_date = datetime.strptime('2017/07/01 00:00:00',dateformat)
startDate = datetime.timestamp(set_date)
#This function calculates the difference in months between two dates: ignore
def month_delta(start_date, end_date):
delta = rd(end_date, start_date)
# >>> relativedelta(years=+2, months=+3, days=+28)
return 12 * delta.years + delta.months
d1 = datetime(2017, 7, 1)
d2 = datetime(2019, 10, 29)
total_months = month_delta(d1, d2)
# Apply a lambda function to each row by adding 5 to each value in each column
dfobj = open_date_data.apply(lambda x: x + startDate)
print(dfobj)
I'm only using a single column from the loaded data set. It's a date column in the following format ("%Y/%m/%d %H:%M:%S"). I want to find the difference in months between startDate and all the dates in the series.
As I don't have your original csv, I've made up some sample data and hopefully managed to shorten your code quite a bit:
open_date_data = pd.Series(pd.date_range('2017/07/01', periods=10, freq='M'))
startDate = pd.Timestamp("2017/07/01")
Then, with help from this answer to get the appropriate month_diff
formula:
def month_diff(a, b):
return 12 * (a.year - b.year) + (a.month - b.month)
open_date_data.apply(lambda x: month_diff(x, startDate))