Search code examples
tensorflow2.0kubeflowkubeflow-pipelines

Error occurred when finalizing GeneratorDataset iterator: Cancelled: Operation was cancelled


While running kubeflow pipeline having code that uses tensorflow 2.0. below error is displayed at end of each epoch

W tensorflow/core/kernels/data/generator_dataset_op.cc:103] Error occurred when finalizing GeneratorDataset iterator: Cancelled: Operation was cancelled

Also, after some epochs, it does not show log and shows this error

This step is in Failed state with this message: The node was low on resource: memory. Container main was using 100213872Ki, which exceeds its request of 0. Container wait was using 25056Ki, which exceeds its request of 0.


Solution

  • This was due to incompatible CUDA and Tensorflow versions. below versions work well with each other

    tensorflow-gpu==2.0.0

    tensorflow-addons==0.6.0

    nvidia/cuda:10.0-cudnn7-runtime