python pandas time-series prediction forecasting

Time series analysis For loop Python

I'm trying to automate the process of predicting (1) the total demand of each State and (2) demand of each Customer in each State. The statistic method applied is Moving Average. The predict time is 1 month ahead.The data is imported from an excel sheet having 5 columns: Customer, State, Product, Quantity, Order Date. The excel file can be found via the link https://drive.google.com/file/d/1JlIqWl8bfyJ3Io01Zx088GIAC6rRuCa8/view?usp=sharing

One Customer can be associated with different States, for example, Aaron Bergman can buy Chair, Art, Phone from stores in Washington, Texas and Oklahoma. The other customers has the same purchase behaviour. For (1) I tried using For loop, but it did not work. The error is Order_Date not in index

df = pd.read_excel("Sales_data.xlsx")
State_Name = df.State.unique()
Customer_Name = df.Customer.unique()

for x in State_Name:
   df = df[['Order_Date', 'Quantity']]
   df['Order_Date'].min(), df['Order_Date'].max()
   df.isnull().sum()

   df.Timestamp = pd.to_datetime(df.Order_Date, format= '%D-%M-%Y %H:%m')
   df.index = df.Timestamp
   df = df.resample('MS').sum()

   rolling_mean = df.Quantity.rolling(window=10).mean()

Solution

Consider turning for loop lines into a defined method and call it with groupby to return time series. Also, heed best practices in pandas:

Avoid referencing columns as attributes with period qualifiers. Instead, use bracketing [].
Avoid [] with a list for column subsetting. Instead, use reindex.

def rollmean_func(df):
   # BETTER COLUMN SUBSET
   df = df.reindex(['Order_Date', 'Quantity'], axis='columns')  

   # BETTER COLUMN ASSIGNMENT
   df['Timestamp'] = pd.to_datetime(df['Order_Date'], format= '%D-%M-%Y %H:%m')  
   df.index = df['Timestamp']

   df = df.resample('MS').sum()
   rolling_mean = df['Quantity'].rolling(window=10).mean()
  
   return rolling_mean

States Level

state_rollmeans = df.groupby(['State']).apply(rollmean_func)
state_rollmeans
# State      Timestamp 
# Alabama    2014-04-01     NaN
#            2014-05-01     NaN
#            2014-06-01     NaN
#            2014-07-01     NaN
#            2014-08-01     NaN
# ...
# Wisconsin  2017-09-01    10.6
#            2017-10-01     7.5
#            2017-11-01     9.7
#            2017-12-01    12.3
# Wyoming    2016-11-01     NaN
# Name: Quantity, Length: 2070, dtype: float64

Customers Level

customer_rollmeans = df.groupby(['Customer_Name']).apply(rollmean_func)
customer_rollmeans
# Customer_Name       Timestamp 
# Aaron Bergman       2014-02-01    NaN
#                     2014-03-01    NaN
#                     2014-04-01    NaN
#                     2014-05-01    NaN
#                     2014-06-01    NaN
# ...
# Zuschuss Donatelli  2017-02-01    1.2
#                     2017-03-01    0.7
#                     2017-04-01    0.7
#                     2017-05-01    0.0
#                     2017-06-01    0.3
# Name: Quantity, Length: 26818, dtype: float64