I'm trying to create a new pandas.DataFrame
column of the number of business days between two date columns. I'm unable to reference the dates in the date columns as arguments in a function call (I get a TypeError: Cannot convert input error). However, I'm able to zip the values in the series into a List and use a For Loop to reference the parameters. Ideally, I would prefer to create a GroupBy object from the two Date columns and calculate the difference.
import pandas as pd
df = pd.DataFrame.from_dict({'Date1': ['2017-05-30 16:00:00',
'2017-05-30 16:00:00',
'2017-05-30 16:00:00'],
'Date2': ['2017-06-16 16:00:00',
'2017-07-21 16:00:00',
'2017-08-18 16:00:00'],
'Value1': [2.97, 3.3, 4.03],
'Value2': [96L, 14L, 2L]})
df['Date1'] = pd.to_datetime(df['Date1'])
df['Date2'] = pd.to_datetime(df['Date2'])
df.dtypes
Date1 datetime64[ns]
Date2 datetime64[ns]
Value1 float64
Value2 int64
dtype: object
def date_diff(startDate, endDate):
return float(len(pd.bdate_range(startDate, endDate)) - 1)
df['DateDiff'] = date_diff(df['Date1'], df['Date2'])
TypeError: Cannot convert input [0 2017-05-30 16:00:00
1 2017-05-30 16:00:00
2 2017-05-30 16:00:00
Name: Date1, dtype: datetime64[ns]] of type <class 'pandas.core.series.Series'> to Timestamp
date_List = list(zip(df['Date1'], df['Date2']))
for i in range(len(date_List)):
df.loc[(df['Date1'] == date_List[i][0]) & (df['Date2'] == date_List[i][1]), 'diff'] = date_diff(date_List[i][0], date_List[i][1])
Date1 Date2 Value1 Value2 diff
0 2017-05-30 16:00:00 2017-06-16 16:00:00 2.97 96 13.0
1 2017-05-30 16:00:00 2017-07-21 16:00:00 3.30 14 38.0
2 2017-05-30 16:00:00 2017-08-18 16:00:00 4.03 2 58.0
grp = df.groupby(['Date1', 'Date2'])
[((Timestamp('2017-05-30 16:00:00'), Timestamp('2017-06-16 16:00:00')),
Date1 Date2 Value1 Value2 diff
0 2017-05-30 16:00:00 2017-06-16 16:00:00 2.97 96 13.0),
((Timestamp('2017-05-30 16:00:00'), Timestamp('2017-07-21 16:00:00')),
Date1 Date2 Value1 Value2 diff
1 2017-05-30 16:00:00 2017-07-21 16:00:00 3.3 14 38.0),
((Timestamp('2017-05-30 16:00:00'), Timestamp('2017-08-18 16:00:00')),
Date1 Date2 Value1 Value2 diff
2 2017-05-30 16:00:00 2017-08-18 16:00:00 4.03 2 58.0)]
You need a type cast to datetime64[D]
to make numpy happy like:
import numpy as np
def date_diff(start_dates, end_dates):
return np.busday_count(
start_dates.values.astype('datetime64[D]'),
end_dates.values.astype('datetime64[D]'))
import pandas as pd
df = pd.DataFrame.from_dict({'Date1': ['2017-05-30 16:00:00',
'2017-05-30 16:00:00',
'2017-05-30 16:00:00'],
'Date2': ['2017-06-16 16:00:00',
'2017-07-21 16:00:00',
'2017-08-18 16:00:00'],
'Value1': [2.97, 3.3, 4.03],
'Value2': [96L, 14L, 2L]})
df['Date1'] = pd.to_datetime(df['Date1'])
df['Date2'] = pd.to_datetime(df['Date2'])
df['DateDiff'] = date_diff(df['Date1'], df['Date2'])
print(df)
Date1 Date2 Value1 Value2 DateDiff
0 2017-05-30 16:00:00 2017-06-16 16:00:00 2.97 96 13
1 2017-05-30 16:00:00 2017-07-21 16:00:00 3.30 14 38
2 2017-05-30 16:00:00 2017-08-18 16:00:00 4.03 2 58