Problem Statement: (Multiple Linear regression) A digital media company (Netflix, etc.) had launched a show. Initially, the show got a good response, but then witnessed a decline in viewership. The company wants to figure out what went wrong.
I want to create an extra column i.e media['days'] which basically keeps a count of the total numbers of days the show is running. Suppose the 1st day of the show is on 1st March 2017, i.e 2017-03-1.
The code I written is as follows.
media['Date'] = pd.to_datetime(media['Date'])
#deriving "days since the show started"
import datetime
d0 = date(2017, 2, 28)
d1 = media.Date #media is a dataframe variable
delta = d1 - d0
media['Day'] = delta
The error which I get is:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
in
3 d0 = date(2017, 2, 28)
4 d1 = media.Date #media is a dataframe variable
----> 5 delta = d1 - d0
6 media['Day'] = delta
c:\DEV\work\lib\site-packages\pandas\core\ops\__init__.py in wrapper(left, right)
990 # test_dt64_series_add_intlike, which the index dispatching handles
991 # specifically.
--> 992 result = dispatch_to_index_op(op, left, right, pd.DatetimeIndex)
993 return construct_result(
994 left, result, index=left.index, name=res_name, dtype=result.dtype
c:\DEV\work\lib\site-packages\pandas\core\ops\__init__.py in dispatch_to_index_op(op, left, right,
index_class)
628 left_idx = left_idx._shallow_copy(freq=None)
629 try:
--> 630 result = op(left_idx, right)
631 except NullFrequencyError:
632 # DatetimeIndex and TimedeltaIndex with freq == None raise ValueError
TypeError: unsupported operand type(s) for -: 'DatetimeIndex' and 'datetime.date'
I can see the data type is mis-matching.
d0 is of the type: datetime.date &
d1 is of the type: pandas.core.series.Series
So can anyone help me as to how...I can convert / parse the value of d0 to be exactly same as that of d1.
It is necessary to convert the datetime.date
in order to get the interval. To do this, you have to wrap d0
in pd.to_datetime
.
i.e. the following should work, giving a delta in days, if you want just the integer part, you can use dt
accessor on the datetime series.
delta = d1 - pd.to_datetime(d0)
# or
delta = (d1 - pd.to_datetime(d0)).dt.days