Search code examples
pythondatetimetimedelta

I need to subtract one day from a datetime but the column of data is dtype: datetime64[ns] but the single cell entry is Timestamp


I have a dataframe with a column of dates and several columns with a bunch of other values. When I print the column I get the message after running:

from datetime import datetime, timedelta

In [84]: df2[0]
Out[84]: 
0      2023-01-01 00:00:00
1      2023-01-01 01:00:00
2      2023-01-01 02:00:00
3      2023-01-01 03:00:00
4      2023-01-01 04:00:00
       ...
8657   2022-12-31 19:00:00
8658   2022-12-31 20:00:00
8659   2022-12-31 21:00:00
8660   2022-12-31 22:00:00
8661   2022-12-31 23:00:00
Name: 0, Length: 8662, dtype: datetime64[ns]

but when I print a single entry I get the message:

In [85]: type(df2[0][0])
Out[85]: pandas._libs.tslibs.timestamps.Timestamp

Firstly, can someone explain why and what the difference is?

My goal is to be able to extract this column of its unique entries (there are multiple of each) then have those in a new df, where I will need to adjust them by subtracting a day or two which I can then add as a new column. For example:

dates = df['BusinessUtc'].unique()
df2 = pd.DataFrame(dates)
df2['ND_eval1'] = ''

df2['ND_eval1'][0] = dates[0] - datetime.timedelta(days=1)

Which gives the following error

type object 'datetime.datetime' has no attribute 'timedelta'

I tried converting to the entries using datetime.strptime() but I feel this shouldn't be necessary, if the entries are already in datetime format why can't I just copy paste them and make some subtractions?


Solution

  • You have already imported the timedelta. You must use like this:

    df2['ND_eval1'][0] = dates[0] - timedelta(days=1)
    

    if you run this you will get an error like this:

    UFuncTypeError: ufunc 'subtract' cannot use operands with types dtype('<M8[ns]') and dtype('O')
    

    Because:

    print(type(dates)) #    -- > <class 'numpy.ndarray'>
    print(type(dates[0])) # -- > <class 'numpy.datetime64'>
    

    You are trying to manipulate a numpy type date using pandas. You have to convert numpy.datetime64 to pandas equivalent like this:

    df2['ND_eval1'][0] = pd.Timestamp(dates[0]) - .timedelta(days=1)
    

    In addition: You can use df['BusinessUtc'].value_counts().index instead of df['BusinessUtc'].unique() It returns a data type that you can use in this pandas (<class 'pandas._libs.tslibs.timestamps.Timestamp'>) and no need pd.Timestamp.

    For access or edit single value you should use DataFrame.at like this:

    df2.at[0,"ND_eval1"] = dates[0] - timedelta(days=1)