Search code examples
pythonrrpy2

Embedded R Try Except Add Loop


I'm following along with the blog below as a python/R novice and having trouble adding a loop statement to the code below. Currently i'm able to get the code run in full, but only outputs the seasonal flag for 1 customer. I would like it to loop and run for all of my customers.

datamovesme.com/2018/07/01/seasonality-python-code

##Here comes the R code piece     
     try:
          seasonal = r(''' 
          fit<-tbats(customerTS, seasonal.periods = 12, use.parallel = TRUE)
          fit$seasonal
          ''')
      except: seasonal = 1
      seasonal_output = seasonal_output.append({'customer_id':customerid, 'seasonal': seasonal}, ignore_index=True)
      print(f' {customerid} | {seasonal} ')
print(seasonal_output)
seasonal_output.to_csv(outfile)

I've tried many combinations of code to get it to loop, too many to list here. The blog shows the existing data frames, and time-series objects that are available to us. I am not sure which one to use and how to pass it to the R code.

Thanks !


Solution

  • The blog link maintains issues:

    1. Code does not properly indent lines as a requirement in Python syntax. Possibly, this is due to website rendering of white space or tabs but this is a disservice to readers as missing an indent changes output.

    2. Code failed to heed the inefficiency issue of appending data frames: Never call DataFrame.append or pd.concat inside a for-loop. It leads to quadratic copying. Instead, since seasonal is one value build a list of dictionaries that you cast into the pd.DataFrame() constructor outside of the loop.

    After resolving above issues and running entire code block, your solution should output a data frame across all customerids.

    # ... same above assignments ...
    outfile = '[put your file path here].csv'
    df_list = []
    
    for customerid, dataForCustomer in filledIn.groupby(by=['customer_id']):
        startYear = dataForCustomer.head(1).iloc[0].yr
        startMonth = dataForCustomer.head(1).iloc[0].mnth
        endYear = dataForCustomer.tail(1).iloc[0].yr
        endMonth = dataForCustomer.tail(1).iloc[0].mnth
    
        #Creating a time series object
        customerTS = stats.ts(dataForCustomer.usage.astype(int),
                              start=base.c(startYear,startMonth),
                              end=base.c(endYear, endMonth), 
                              frequency=12)
        r.assign('customerTS', customerTS)
    
        ##Here comes the R code piece
        try:
            seasonal = r('''
                            fit<-tbats(customerTS, seasonal.periods = 12, use.parallel = TRUE)
                            fit$seasonal
                         ''')
        except: 
            seasonal = 1
    
        # APPEND DICTIONARY TO LIST (NOT DATA FRAME)
        df_list.append({'customer_id': customerid, 'seasonal': seasonal})
        print(f' {customerid} | {seasonal} ')
    
    seasonal_output = pd.DataFrame(df_list)
    print(seasonal_output)
    seasonal_output.to_csv(outfile)