I have a large data frame with transaction data. What I am trying to do is use python to aggregate the data starting with zip codes, then a year and month, finally the total number of transactions for that month.
My Df:
Date VAR1 VAR2 ZipCode Transactions
YYYY-MM-DD. X. Y. 12345. 1.
So the first thing I did was convert the to date time
df['Date'] = pd.to_datetime(df['Date'])
df.info()
# Date datetime64[ns]
Then I split the data into year-month and number of transactions:
# grouping the data by year and month
per = df.Date.dt.to_period("M")
g = df.groupby(per)
g.sum() # so now that this works, we need to break it up into zip codes
Which gives an output of:
Date. Transactions
YYYY-MM. X
YYYY-MM. Y
My questions is, what am I missing to get the zipcodes in front:
ZipCode. Date. Transactions
123345. YYYY-MM. sum()
Any and all help is greatly apprecaited
I believe you need add column ZipCode
to groupby
if need grouping per zip and per months:
per = df.Date.dt.to_period("M")
df1 = df.groupby(['ZipCode',per])['Transactions'].sum().reset_index()