I'm trying to do time series forecasting in Python.
Before I start doing it, I have some doubts in how we can Prepare source dataset.
Just want to understand the structure of data.
Let's say I have a department and in each department there are multiple Teams, I want to time series forecasting on Total Sales By each department.
I can prepare the data in the below options:
Most of the tutorials which I have seen online is using Option 2. But I prefer Option 1
Because in future if there are more new departments coming 1, then it can be added at the row level, whereas in Option-2 I need to add more and more columns each time.
My Question is :
Can I use the structure in Option-1 for preparing my dataset?
If Yes, in the Date column, I can see 1st June has 3 records for each team in a department. So is there any condition whether a row should have a date only once?
In Option-1, Let's say I want to predict total sales By department. Will adding a addition column like Team Name have any impact while preparing models for time series forecasting?
I would be really glad if someone could help. Thanks in advance.
While making a forecast your data preparation will depend on what answers you are trying to find (don't get me wrong, I'm not saying you manipulate your preparation to get the answers you need). What I mean by this is, you say "I want to time series forecasting on Total Sales By each department". This would imply you don't care about the teams within a department. In that case it's not ideal to go for option-1, because to then get the total sales of any department you will have to perform some work to calculate it, instead of simply reading the value you need.
However it is very common to have your source data in a more detailed level than in which you are going to use it. The key take-away here is that you are going to use python
to read this data. Aggregating data to the level you need it, should be done in Python and it is absolutely fine to store it more detailed in for example a .csv
file.
To answer you questions:
department
and not on Team
you can for example use the pandas
library to aggregate your data on department
after your read it. There is no use in keeping the detailed Team
information at that point.You have good questions, but it is difficult to give a clear and complete answer on all of them. My advice would be to get any kind of result as quickly as possible while trying to be clear about the choices you make along the way. Then when you have your result you can finetune and review previous decisions. No forecasting model is every perfect (ever) or done in one try.