I have a dataframe with a column of dates and several columns with a bunch of other values. When I print the column I get the message after running:
from datetime import datetime, timedelta
In [84]: df2[0]
Out[84]:
0 2023-01-01 00:00:00
1 2023-01-01 01:00:00
2 2023-01-01 02:00:00
3 2023-01-01 03:00:00
4 2023-01-01 04:00:00
...
8657 2022-12-31 19:00:00
8658 2022-12-31 20:00:00
8659 2022-12-31 21:00:00
8660 2022-12-31 22:00:00
8661 2022-12-31 23:00:00
Name: 0, Length: 8662, dtype: datetime64[ns]
but when I print a single entry I get the message:
In [85]: type(df2[0][0])
Out[85]: pandas._libs.tslibs.timestamps.Timestamp
Firstly, can someone explain why and what the difference is?
My goal is to be able to extract this column of its unique entries (there are multiple of each) then have those in a new df, where I will need to adjust them by subtracting a day or two which I can then add as a new column. For example:
dates = df['BusinessUtc'].unique()
df2 = pd.DataFrame(dates)
df2['ND_eval1'] = ''
df2['ND_eval1'][0] = dates[0] - datetime.timedelta(days=1)
Which gives the following error
type object 'datetime.datetime' has no attribute 'timedelta'
I tried converting to the entries using datetime.strptime() but I feel this shouldn't be necessary, if the entries are already in datetime format why can't I just copy paste them and make some subtractions?
You have already imported the timedelta
. You must use like this:
df2['ND_eval1'][0] = dates[0] - timedelta(days=1)
if you run this you will get an error like this:
UFuncTypeError: ufunc 'subtract' cannot use operands with types dtype('<M8[ns]') and dtype('O')
Because:
print(type(dates)) # -- > <class 'numpy.ndarray'>
print(type(dates[0])) # -- > <class 'numpy.datetime64'>
You are trying to manipulate a numpy type date using pandas. You have to convert numpy.datetime64
to pandas equivalent like this:
df2['ND_eval1'][0] = pd.Timestamp(dates[0]) - .timedelta(days=1)
In addition:
You can use df['BusinessUtc'].value_counts().index
instead of df['BusinessUtc'].unique()
It returns a data type that you can use in this pandas (<class 'pandas._libs.tslibs.timestamps.Timestamp'>
) and no need pd.Timestamp
.
For access or edit single value you should use DataFrame.at
like this:
df2.at[0,"ND_eval1"] = dates[0] - timedelta(days=1)