I have really been having a tough time here.
My DataFrame looks like this
Purchase_Date Customer_ID Gender
0 2012-12-18 00:00:00 7223 F
1 2012-12-20 00:00:00 7841 M
2 2012-12-21 00:00:00 8374 F
My goal is to change the "Purchase Date" column from string to datetime object so that I can run a cohort analysis by applying this function to it:
def get_month(x): return dt.datetime(x.year, x.month, 1)
data['InvoiceMonth'] = data['Purchase_Date'].apply(get_month)
grouping = data.groupby('Customer_ID')['InvoiceMonth']
data['CohortMonth'] = grouping.transform('min')
the function returns error: 'str' object has no attribute 'year' I have tried the following functions and played with all arguments (dayfirst, yearfirst...)
data["Purchase_Date"] = pd.to_datetime(data["Purchase_Date"])
pd.to_datetime()
datetime.datetime.strptime()
I keep getting ValueError: day is out of range for month
Please help out
So, you were almost there:
data["Purchase_Date"] = pd.to_datetime(data["Purchase_Date"])
data['InvoiceMonth'] = data["Purchase_Date"].dt.strftime("%Y-%m-01")
(Outputs month in object
format - you can convert it to datetime
by adding pd.to_datetime(...)
)
Or alternatively - using your approach:
data["Purchase_Date"] = pd.to_datetime(data["Purchase_Date"])
import datetime as dt
def get_month(x): return dt.datetime(x.year, x.month, 1)
data['InvoiceMonth'] = data["Purchase_Date"].apply(get_month)
(Outputs month as datetime
)
Both will return, although I would highly recommend the first option:
Purchase_Date Customer_ID Gender InvoiceMonth
0 2012-12-18 7223 F 2012-12-01
1 2012-12-20 7841 M 2012-12-01
2 2012-12-21 8374 F 2012-12-01