Search code examples
pythonpandastime-seriespredictionforecasting

Time series analysis For loop Python


I'm trying to automate the process of predicting (1) the total demand of each State and (2) demand of each Customer in each State. The statistic method applied is Moving Average. The predict time is 1 month ahead.The data is imported from an excel sheet having 5 columns: Customer, State, Product, Quantity, Order Date. The excel file can be found via the link https://drive.google.com/file/d/1JlIqWl8bfyJ3Io01Zx088GIAC6rRuCa8/view?usp=sharing

One Customer can be associated with different States, for example, Aaron Bergman can buy Chair, Art, Phone from stores in Washington, Texas and Oklahoma. The other customers has the same purchase behaviour. For (1) I tried using For loop, but it did not work. The error is Order_Date not in index

df = pd.read_excel("Sales_data.xlsx")
State_Name = df.State.unique()
Customer_Name = df.Customer.unique()

for x in State_Name:
   df = df[['Order_Date', 'Quantity']]
   df['Order_Date'].min(), df['Order_Date'].max()
   df.isnull().sum()

   df.Timestamp = pd.to_datetime(df.Order_Date, format= '%D-%M-%Y %H:%m')
   df.index = df.Timestamp
   df = df.resample('MS').sum()

   rolling_mean = df.Quantity.rolling(window=10).mean()



Solution

  • Consider turning for loop lines into a defined method and call it with groupby to return time series. Also, heed best practices in pandas:

    def rollmean_func(df):
       # BETTER COLUMN SUBSET
       df = df.reindex(['Order_Date', 'Quantity'], axis='columns')  
    
       # BETTER COLUMN ASSIGNMENT
       df['Timestamp'] = pd.to_datetime(df['Order_Date'], format= '%D-%M-%Y %H:%m')  
       df.index = df['Timestamp']
    
       df = df.resample('MS').sum()
       rolling_mean = df['Quantity'].rolling(window=10).mean()
      
       return rolling_mean
    

    States Level

    state_rollmeans = df.groupby(['State']).apply(rollmean_func)
    state_rollmeans
    # State      Timestamp 
    # Alabama    2014-04-01     NaN
    #            2014-05-01     NaN
    #            2014-06-01     NaN
    #            2014-07-01     NaN
    #            2014-08-01     NaN
    # ...
    # Wisconsin  2017-09-01    10.6
    #            2017-10-01     7.5
    #            2017-11-01     9.7
    #            2017-12-01    12.3
    # Wyoming    2016-11-01     NaN
    # Name: Quantity, Length: 2070, dtype: float64
    

    Customers Level

    customer_rollmeans = df.groupby(['Customer_Name']).apply(rollmean_func)
    customer_rollmeans
    # Customer_Name       Timestamp 
    # Aaron Bergman       2014-02-01    NaN
    #                     2014-03-01    NaN
    #                     2014-04-01    NaN
    #                     2014-05-01    NaN
    #                     2014-06-01    NaN
    # ...
    # Zuschuss Donatelli  2017-02-01    1.2
    #                     2017-03-01    0.7
    #                     2017-04-01    0.7
    #                     2017-05-01    0.0
    #                     2017-06-01    0.3
    # Name: Quantity, Length: 26818, dtype: float64