Search code examples
google-cloud-platformgoogle-cloud-automlgoogle-cloud-vertex-ai

How to select a target column in a Vertex AI AutoML time series model


I am testing out Google Cloud Vertex AI with a time series AutoML model.

I have created a dataset, from a Biguery table, with 2 columns, one of a timestamp and another of a numeric value I want to predict:

enter image description here

salesorderdate is my TIMESTAMP column and orders is the value I want to predict.

When I proceed to the next step I cannot select orders as my value to predict, there are no available options for this field:

enter image description here

What am I missing here? Surely the time series value is the target value in this case? Is there an expectation of more fields here, and can one actually add additional features as columns to a time series model in this way?


Solution

  • I guess from your question that you are using "forecasting models". Please note that it is in "Preview" Product launch stage with all consequences of that fact.

    In the documentation you may find Training data structure following information:

    • There must be at least two and no more than 1,000 columns.

    For datasets that train AutoML models, one column must be the target, and there must be at least one feature available to train the model. If the training data does not include the target column, Vertex AI cannot associate the training data with the desired result.

    I suppose you are using AutoML models so in this situation you need to have 3 columns in the data set:

    • Time column - used to place the observation represented by that row in time
    • time series identifier column as "Forecasting training data usually includes multiple time series"
    • and target column which is value that model should learn to predict.

    If you want to predict orders this should be target column. But before you are choosing this target this "time series identifier column" is already chosen in previous step, so you do not have available column to choose.

    So you need to add to your BigQuery table at least one additional column with will be used as time series column. You can add to your data set column with the same value in each row. This concept is presented in Forecasting data preparation best practices:

    You can train a forecasting model on a single time series (in other words, the time series identifier column contains the same value for all rows). However, Vertex AI is a better fit for training data that contains two or more time series. For best results, you should have at least 10 time series for every column used to train the model.