I am testing out Google Cloud Vertex AI with a time series AutoML model.
I have created a dataset, from a Biguery table, with 2 columns, one of a timestamp and another of a numeric value I want to predict:
salesorderdate
is my TIMESTAMP
column and orders
is the value I want to predict.
When I proceed to the next step I cannot select orders
as my value to predict, there are no available options for this field:
What am I missing here? Surely the time series value is the target value in this case? Is there an expectation of more fields here, and can one actually add additional features as columns to a time series model in this way?
I guess from your question that you are using "forecasting models". Please note that it is in "Preview" Product launch stage with all consequences of that fact.
In the documentation you may find Training data structure following information:
- There must be at least two and no more than 1,000 columns.
For datasets that train AutoML models, one column must be the target, and there must be at least one feature available to train the model. If the training data does not include the target column, Vertex AI cannot associate the training data with the desired result.
I suppose you are using AutoML models so in this situation you need to have 3 columns in the data set:
If you want to predict orders
this should be target column. But before you are choosing this target this "time series identifier column" is already chosen in previous step, so you do not have available column to choose.
So you need to add to your BigQuery table at least one additional column with will be used as time series column. You can add to your data set column with the same value in each row. This concept is presented in Forecasting data preparation best practices:
You can train a forecasting model on a single time series (in other words, the time series identifier column contains the same value for all rows). However, Vertex AI is a better fit for training data that contains two or more time series. For best results, you should have at least 10 time series for every column used to train the model.