Search code examples
pythonpandasdatetimetimedelta

cannot add the type DateOffset to a TimedeltaArray


Let's say the dataframe (df) consists of 3 columns.

V1  V2              V3
1   0 days 23:09:00 0 days 23:34:00
1   0 days 23:36:00 1 days 00:03:00
1   1 days 00:06:00 1 days 00:29:00
1   1 days 00:31:00 1 days 00:57:00
2   0 days 22:40:00 0 days 23:04:00
2   0 days 23:09:00 0 days 23:35:00
2   0 days 23:37:00 1 days 00:01:00
2   1 days 00:06:00 1 days 00:30:00
2   1 days 00:33:00 1 days 00:56:00
3   0 days 22:50:00 0 days 23:21:09
3   0 days 23:38:56 1 days 00:09:00
3   1 days 00:12:00 1 days 00:42:09

I have used the following code:

df['V4']=(df.groupby('V1')['V3'] - df.groupby('V1')['V2'].shift(1)).astype('timedelta64[m]')

Essentially, I want to perform operation for each unique value in V1 and the result should look like this:

V1  V2              V3              V4
1   0 days 23:09:00 0 days 23:34:00 NaN
1   0 days 23:36:00 1 days 00:03:00 54
1   1 days 00:06:00 1 days 00:29:00 53
1   1 days 00:31:00 1 days 00:57:00 51
2   0 days 22:40:00 0 days 23:04:00 NaN
2   0 days 23:09:00 0 days 23:35:00 55
2   0 days 23:37:00 1 days 00:01:00 52
2   1 days 00:06:00 1 days 00:30:00 53
2   1 days 00:33:00 1 days 00:56:00 50
3   0 days 22:50:00 0 days 23:21:09 NaN
3   0 days 23:38:56 1 days 00:09:00 79
3   1 days 00:12:00 1 days 00:42:09 63

Error received:

Cannot add/subtract non-tick DateOffset to TimedeltaArray

Datatypes:

{'V1': {1: 1, 2: 2, 3: 3}, 'V2': {0: Timedelta('0 days 23:09:00'), 1: Timedelta('0 days 23:36:00')}, 'V3': {0: Timedelta('0 days 23:34:00'), 1: Timedelta('1 days 00:03:00')}, 'V4': {0: 54, 1: 53}}

Solution

  • Try this:

    1. Do the subtraction on all rows
    2. Set the value as NaN when there is a change in V1.
    df = df.sort_values(["V1", "V2", "V3"])
    df["V4"] = (df["V3"]-df["V2"].shift()).dt.seconds//60
    df["V4"] = df["V4"].where(df["V1"]==df["V1"].shift())
    
    >>> df
        V1              V2              V3    V4
    0    1 0 days 23:09:00 0 days 23:34:00   NaN
    1    1 0 days 23:36:00 1 days 00:03:00  54.0
    2    1 1 days 00:06:00 1 days 00:29:00  53.0
    3    1 1 days 00:31:00 1 days 00:57:00  51.0
    4    2 0 days 22:40:00 0 days 23:04:00   NaN
    5    2 0 days 23:09:00 0 days 23:35:00  55.0
    6    2 0 days 23:37:00 1 days 00:01:00  52.0
    7    2 1 days 00:06:00 1 days 00:30:00  53.0
    8    2 1 days 00:33:00 1 days 00:56:00  50.0
    9    3 0 days 22:50:00 0 days 23:21:09   NaN
    10   3 0 days 23:38:56 1 days 00:09:00  79.0
    11   3 1 days 00:12:00 1 days 00:42:09  63.0
    

    If you want to use groupby:

    df["V4"] = df.groupby("V1").apply(lambda x: (x["V3"]-x["V2"].shift()).dt.seconds//60).reset_index(drop=True)
    
    >>> df
        V1              V2              V3    V4
    0    1 0 days 23:09:00 0 days 23:34:00   NaN
    1    1 0 days 23:36:00 1 days 00:03:00  54.0
    2    1 1 days 00:06:00 1 days 00:29:00  53.0
    3    1 1 days 00:31:00 1 days 00:57:00  51.0
    4    2 0 days 22:40:00 0 days 23:04:00   NaN
    5    2 0 days 23:09:00 0 days 23:35:00  55.0
    6    2 0 days 23:37:00 1 days 00:01:00  52.0
    7    2 1 days 00:06:00 1 days 00:30:00  53.0
    8    2 1 days 00:33:00 1 days 00:56:00  50.0
    9    3 0 days 22:50:00 0 days 23:21:09   NaN
    10   3 0 days 23:38:56 1 days 00:09:00  79.0
    11   3 1 days 00:12:00 1 days 00:42:09  63.0
    
    Input:
    df = pd.DataFrame({"V1": [1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3],
                       "V2": [pd.Timedelta("0 days 23:09:00"), pd.Timedelta("0 days 23:36:00"), pd.Timedelta("1 days 00:06:00"), pd.Timedelta("1 days 00:31:00"),
                              pd.Timedelta("0 days 22:40:00"), pd.Timedelta("0 days 23:09:00"), pd.Timedelta("0 days 23:37:00"), pd.Timedelta("1 days 00:06:00"),
                              pd.Timedelta("1 days 00:33:00"), pd.Timedelta("0 days 22:50:00"), pd.Timedelta("0 days 23:38:56"), pd.Timedelta("1 days 00:12:00")],
                       "V3":[pd.Timedelta("0 days 23:34:00"), pd.Timedelta("1 days 00:03:00"), pd.Timedelta("1 days 00:29:00"), pd.Timedelta("1 days 00:57:00"),
                             pd.Timedelta("0 days 23:04:00"), pd.Timedelta("0 days 23:35:00"), pd.Timedelta("1 days 00:01:00"), pd.Timedelta("1 days 00:30:00"),
                             pd.Timedelta("1 days 00:56:00"), pd.Timedelta("0 days 23:21:09"), pd.Timedelta("1 days 00:09:00"), pd.Timedelta("1 days 00:42:09")]
                       })
    
    >>> df
        V1              V2              V3
    0    1 0 days 23:09:00 0 days 23:34:00
    1    1 0 days 23:36:00 1 days 00:03:00
    2    1 1 days 00:06:00 1 days 00:29:00
    3    1 1 days 00:31:00 1 days 00:57:00
    4    2 0 days 22:40:00 0 days 23:04:00
    5    2 0 days 23:09:00 0 days 23:35:00
    6    2 0 days 23:37:00 1 days 00:01:00
    7    2 1 days 00:06:00 1 days 00:30:00
    8    2 1 days 00:33:00 1 days 00:56:00
    9    3 0 days 22:50:00 0 days 23:21:09
    10   3 0 days 23:38:56 1 days 00:09:00
    11   3 1 days 00:12:00 1 days 00:42:09