I'm trying to automate the process of predicting (1) the total demand of each State and (2) demand of each Customer in each State. The statistic method applied is Moving Average. The predict time is 1 month ahead.The data is imported from an excel sheet having 5 columns: Customer, State, Product, Quantity, Order Date. The excel file can be found via the link https://drive.google.com/file/d/1JlIqWl8bfyJ3Io01Zx088GIAC6rRuCa8/view?usp=sharing
One Customer can be associated with different States, for example, Aaron Bergman can buy Chair, Art, Phone from stores in Washington, Texas and Oklahoma. The other customers has the same purchase behaviour. For (1) I tried using For loop, but it did not work. The error is Order_Date not in index
df = pd.read_excel("Sales_data.xlsx")
State_Name = df.State.unique()
Customer_Name = df.Customer.unique()
for x in State_Name:
df = df[['Order_Date', 'Quantity']]
df['Order_Date'].min(), df['Order_Date'].max()
df.isnull().sum()
df.Timestamp = pd.to_datetime(df.Order_Date, format= '%D-%M-%Y %H:%m')
df.index = df.Timestamp
df = df.resample('MS').sum()
rolling_mean = df.Quantity.rolling(window=10).mean()
Consider turning for
loop lines into a defined method and call it with groupby
to return time series. Also, heed best practices in pandas
:
[]
.[]
with a list for column subsetting. Instead, use reindex
.def rollmean_func(df):
# BETTER COLUMN SUBSET
df = df.reindex(['Order_Date', 'Quantity'], axis='columns')
# BETTER COLUMN ASSIGNMENT
df['Timestamp'] = pd.to_datetime(df['Order_Date'], format= '%D-%M-%Y %H:%m')
df.index = df['Timestamp']
df = df.resample('MS').sum()
rolling_mean = df['Quantity'].rolling(window=10).mean()
return rolling_mean
States Level
state_rollmeans = df.groupby(['State']).apply(rollmean_func)
state_rollmeans
# State Timestamp
# Alabama 2014-04-01 NaN
# 2014-05-01 NaN
# 2014-06-01 NaN
# 2014-07-01 NaN
# 2014-08-01 NaN
# ...
# Wisconsin 2017-09-01 10.6
# 2017-10-01 7.5
# 2017-11-01 9.7
# 2017-12-01 12.3
# Wyoming 2016-11-01 NaN
# Name: Quantity, Length: 2070, dtype: float64
Customers Level
customer_rollmeans = df.groupby(['Customer_Name']).apply(rollmean_func)
customer_rollmeans
# Customer_Name Timestamp
# Aaron Bergman 2014-02-01 NaN
# 2014-03-01 NaN
# 2014-04-01 NaN
# 2014-05-01 NaN
# 2014-06-01 NaN
# ...
# Zuschuss Donatelli 2017-02-01 1.2
# 2017-03-01 0.7
# 2017-04-01 0.7
# 2017-05-01 0.0
# 2017-06-01 0.3
# Name: Quantity, Length: 26818, dtype: float64