Search code examples
python-3.xpandasgroup-bytime-serieszipcode

Python Pandas stack by zip code and group by month/year


I have a large data frame with transaction data. What I am trying to do is use python to aggregate the data starting with zip codes, then a year and month, finally the total number of transactions for that month.

My Df:

  Date        VAR1   VAR2    ZipCode    Transactions
YYYY-MM-DD.    X.     Y.     12345.         1.      

So the first thing I did was convert the to date time

 df['Date'] = pd.to_datetime(df['Date'])
 df.info()
 # Date datetime64[ns]

Then I split the data into year-month and number of transactions:

# grouping the data by year and month
per = df.Date.dt.to_period("M")  
g = df.groupby(per)
g.sum() # so now that this works, we need to break it up into zip codes

Which gives an output of:

Date.       Transactions
YYYY-MM.        X
YYYY-MM.        Y

My questions is, what am I missing to get the zipcodes in front:

ZipCode.     Date.    Transactions
 123345.   YYYY-MM.     sum()

Any and all help is greatly apprecaited


Solution

  • I believe you need add column ZipCode to groupby if need grouping per zip and per months:

    per = df.Date.dt.to_period("M")
    df1 = df.groupby(['ZipCode',per])['Transactions'].sum().reset_index()