Search code examples
google-cloud-platformtfrecordgcp-ai-platform-trainingkedrotf.data.dataset

Does kedro support tfrecord?


To train tensorflow keras models on AI Platform using Docker containers, we convert our raw images stored on GCS to a tfrecord dataset using tf.data.Dataset. Thereby the data is never stored locally. Instead the raw images are transformed directly to tfrecords to another bucket. Is it possible to make use of kedro with a tfrecord dataset and the streaming capability of tf.data.Dataset? According to the docs kedro doesn't seem to support tfrecord datasets.


Solution

  • Only TF related dataset we have at the moment is TensorFlowModelDataset (https://kedro.readthedocs.io/en/latest/_modules/kedro/extras/datasets/tensorflow/tensorflow_model_dataset.html), but you can easily add your own custom dataset, or please add a feature request/your contribution in the repo