From the docs it says that
The value of the key values of the key (the values in the column) must be in RFC 3339 date-time format, where time-offset = “Z” (e.g. 1985-04-12T23:20:50.52Z)
The dataset that I'm pointing to is a CSV in cloud storage, where the data is in the format suggested by the docs:
$ gsutil cat gs://my-data.csv | head | xsv select TS_SPLIT_COL
TS_SPLIT_COL
2021-01-18T00:00:00.00Z
2021-01-18T00:00:00.00Z
2021-01-04T00:00:00.00Z
2021-03-06T00:00:00.00Z
2021-01-15T00:00:00.00Z
2021-02-11T00:00:00.00Z
2021-02-05T00:00:00.00Z
2021-05-20T00:00:00.00Z
2021-01-05T00:00:00.00Z
But I receive a Training pipeline failed with error message: The timestamp column must have valid timestamp entries.
error when I try to run a training job
EDIT: this should hopefully make it more reproducible
data: https://pastebin.com/qEDqvzX6
Code I'm running:
from google.cloud import aiplatform
PROJECT = "my-project"
DATASET_ID = "dataset-id" # points to CSV
aiplatform.init(project=PROJECT)
dataset = aiplatform.TabularDataset(DATASET_ID)
job = aiplatform.AutoMLTabularTrainingJob(
display_name="so-58454722",
optimization_prediction_type="classification",
optimization_objective="maximize-au-roc",
)
model = job.run(
dataset=dataset,
model_display_name="so-58454722",
target_column="Y",
training_fraction_split=0.8,
validation_fraction_split=0.1,
test_fraction_split=0.1,
timestamp_split_column_name="TS_SPLIT_COL",
)
Try this timestamp format instead:
2022-03-18T01:23:45.123456+00:00
It uses +00:00
instead of Z
to specify timezone.
This change will eliminate the "The timestamp column must have valid timestamp entries." error