While running kubeflow pipeline having code that uses tensorflow 2.0. below error is displayed at end of each epoch
W tensorflow/core/kernels/data/generator_dataset_op.cc:103] Error occurred when finalizing GeneratorDataset iterator: Cancelled: Operation was cancelled
Also, after some epochs, it does not show log and shows this error
This step is in Failed state with this message: The node was low on resource: memory. Container main was using 100213872Ki, which exceeds its request of 0. Container wait was using 25056Ki, which exceeds its request of 0.
This was due to incompatible CUDA and Tensorflow versions. below versions work well with each other
tensorflow-gpu==2.0.0
tensorflow-addons==0.6.0
nvidia/cuda:10.0-cudnn7-runtime