Search code examples
pythonpandaslinear-regressionpython-datetime

Converting datetime.date to pandas.core.series.Series in Python?


Problem Statement: (Multiple Linear regression) A digital media company (Netflix, etc.) had launched a show. Initially, the show got a good response, but then witnessed a decline in viewership. The company wants to figure out what went wrong.

I want to create an extra column i.e media['days'] which basically keeps a count of the total numbers of days the show is running. Suppose the 1st day of the show is on 1st March 2017, i.e 2017-03-1.

The code I written is as follows.

media['Date'] = pd.to_datetime(media['Date'])

#deriving "days since the show started"
import datetime

d0 = date(2017, 2, 28)
d1 = media.Date             #media is a dataframe variable
delta = d1 - d0
media['Day'] = delta

The error which I get is:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
 in 
      3 d0 = date(2017, 2, 28)
      4 d1 = media.Date             #media is a dataframe variable
----> 5 delta = d1 - d0
      6 media['Day'] = delta

c:\DEV\work\lib\site-packages\pandas\core\ops\__init__.py in wrapper(left, right)
    990             # test_dt64_series_add_intlike, which the index dispatching handles
    991             # specifically.
--> 992             result = dispatch_to_index_op(op, left, right, pd.DatetimeIndex)
    993             return construct_result(
    994                 left, result, index=left.index, name=res_name, dtype=result.dtype

c:\DEV\work\lib\site-packages\pandas\core\ops\__init__.py in dispatch_to_index_op(op, left, right, 
index_class)
628         left_idx = left_idx._shallow_copy(freq=None)
629     try:
--> 630         result = op(left_idx, right)
631     except NullFrequencyError:
632         # DatetimeIndex and TimedeltaIndex with freq == None raise ValueError

TypeError: unsupported operand type(s) for -: 'DatetimeIndex' and 'datetime.date'

I can see the data type is mis-matching. d0 is of the type: datetime.date &
d1 is of the type: pandas.core.series.Series

So can anyone help me as to how...I can convert / parse the value of d0 to be exactly same as that of d1.


Solution

  • It is necessary to convert the datetime.date in order to get the interval. To do this, you have to wrap d0 in pd.to_datetime.

    i.e. the following should work, giving a delta in days, if you want just the integer part, you can use dt accessor on the datetime series.

    delta = d1 - pd.to_datetime(d0)
    # or
    delta = (d1 - pd.to_datetime(d0)).dt.days