Is there a way to monitor the console output of model training progress during the Vertex AI training?
Suppose we have a Tensorflow/Keras model training code:
model = keras.Sequential([
layers.Dense(64, activation='relu', input_shape=[len(train_dataset.keys())]),
layers.Dense(64, activation='relu'),
layers.Dense(1)
])
optimizer = tf.keras.optimizers.RMSprop(0.001)
model.compile(
loss='mse',
optimizer=optimizer,
metrics=['mae', 'mse']
)
EPOCHS = 1000
early_stop = keras.callbacks.EarlyStopping(monitor='val_loss', patience=10)
early_history = model.fit(normed_train_data, train_labels,
epochs=EPOCHS, validation_split = 0.2,
callbacks=[early_stop])
When run the model training from the command line, we can see the progress in the console.
Epoch 1/1000
OMP: Info #211: KMP_AFFINITY: decoding x2APIC ids.
OMP: Info #209: KMP_AFFINITY: Affinity capable, using global cpuid leaf 11 info
OMP: Info #154: KMP_AFFINITY: Initial OS proc set respected: 0-3
OMP: Info #156: KMP_AFFINITY: 4 available OS procs
OMP: Info #157: KMP_AFFINITY: Uniform topology
OMP: Info #179: KMP_AFFINITY: 1 packages x 2 cores/pkg x 2 threads/core (2 total cores)
OMP: Info #213: KMP_AFFINITY: OS proc to physical thread map:
OMP: Info #171: KMP_AFFINITY: OS proc 0 maps to package 0 core 0 thread 0
OMP: Info #171: KMP_AFFINITY: OS proc 2 maps to package 0 core 0 thread 1
OMP: Info #171: KMP_AFFINITY: OS proc 1 maps to package 0 core 1 thread 0
OMP: Info #171: KMP_AFFINITY: OS proc 3 maps to package 0 core 1 thread 1
OMP: Info #249: KMP_AFFINITY: pid 1 tid 17 thread 0 bound to OS proc set 0
OMP: Info #249: KMP_AFFINITY: pid 1 tid 17 thread 1 bound to OS proc set 1
OMP: Info #249: KMP_AFFINITY: pid 1 tid 28 thread 2 bound to OS proc set 2
OMP: Info #249: KMP_AFFINITY: pid 1 tid 29 thread 3 bound to OS proc set 3
OMP: Info #249: KMP_AFFINITY: pid 1 tid 30 thread 4 bound to OS proc set 0
OMP: Info #249: KMP_AFFINITY: pid 1 tid 18 thread 5 bound to OS proc set 1
OMP: Info #249: KMP_AFFINITY: pid 1 tid 31 thread 6 bound to OS proc set 2
OMP: Info #249: KMP_AFFINITY: pid 1 tid 32 thread 7 bound to OS proc set 3
OMP: Info #249: KMP_AFFINITY: pid 1 tid 33 thread 8 bound to OS proc set 0
8/8 [==============================] - 2s 31ms/step - loss: 579.6393 - mae: 22.7661 - mse: 579.6393 - val_loss: 571.7239 - val_mae: 22.5494 - val_mse: 571.7239
Epoch 2/1000
8/8 [==============================] - 0s 7ms/step - loss: 527.9056 - mae: 21.6268 - mse: 527.9056 - val_loss: 520.5531 - val_mae: 21.3917 - val_mse: 520.5531
...
However, if we run the training in the Vertex AI training, there looks to be no menu/option to see the console output. Not sure if it is logged in Log Explorer. Please help understand how to monitor the training progress realtime.
You may view training logs in the GCP Logs Explorer by using below query.
resource.type="ml_job"
resource.labels.job_id="your-training-custom-job-ID"
The your-training-custom-job-ID can be found on the ongoing Vertex AI Training in GCP console as seen on the below screenshot.
Below is the screenshot of the logs for the Vertex AI training in GCP logs explorer using the above query.
You may click on Jump to now to immediately view the latest logs. Also, you may use Stream Logs option to view REAL TIME log data which you can also adjust the buffer window in which has certain trade offs. You may refer to this documentation for more information on streaming logs in GCP logs explorer.