I'm following along with the blog below as a python/R novice and having trouble adding a loop statement to the code below. Currently i'm able to get the code run in full, but only outputs the seasonal flag for 1 customer. I would like it to loop and run for all of my customers.
datamovesme.com/2018/07/01/seasonality-python-code
##Here comes the R code piece
try:
seasonal = r('''
fit<-tbats(customerTS, seasonal.periods = 12, use.parallel = TRUE)
fit$seasonal
''')
except: seasonal = 1
seasonal_output = seasonal_output.append({'customer_id':customerid, 'seasonal': seasonal}, ignore_index=True)
print(f' {customerid} | {seasonal} ')
print(seasonal_output)
seasonal_output.to_csv(outfile)
I've tried many combinations of code to get it to loop, too many to list here. The blog shows the existing data frames, and time-series objects that are available to us. I am not sure which one to use and how to pass it to the R code.
Thanks !
The blog link maintains issues:
Code does not properly indent lines as a requirement in Python syntax. Possibly, this is due to website rendering of white space or tabs but this is a disservice to readers as missing an indent changes output.
Code failed to heed the inefficiency issue of appending data frames: Never call DataFrame.append or pd.concat inside a for-loop. It leads to quadratic copying. Instead, since seasonal is one value build a list of dictionaries that you cast into the pd.DataFrame()
constructor outside of the loop.
After resolving above issues and running entire code block, your solution should output a data frame across all customerids.
# ... same above assignments ...
outfile = '[put your file path here].csv'
df_list = []
for customerid, dataForCustomer in filledIn.groupby(by=['customer_id']):
startYear = dataForCustomer.head(1).iloc[0].yr
startMonth = dataForCustomer.head(1).iloc[0].mnth
endYear = dataForCustomer.tail(1).iloc[0].yr
endMonth = dataForCustomer.tail(1).iloc[0].mnth
#Creating a time series object
customerTS = stats.ts(dataForCustomer.usage.astype(int),
start=base.c(startYear,startMonth),
end=base.c(endYear, endMonth),
frequency=12)
r.assign('customerTS', customerTS)
##Here comes the R code piece
try:
seasonal = r('''
fit<-tbats(customerTS, seasonal.periods = 12, use.parallel = TRUE)
fit$seasonal
''')
except:
seasonal = 1
# APPEND DICTIONARY TO LIST (NOT DATA FRAME)
df_list.append({'customer_id': customerid, 'seasonal': seasonal})
print(f' {customerid} | {seasonal} ')
seasonal_output = pd.DataFrame(df_list)
print(seasonal_output)
seasonal_output.to_csv(outfile)