I am studying kubeflow pipelines and how are the different components of the pipeline linked to each other. For this, I am using an example of MNIST project available on the official GitHub repository. But I am not able to understand the difference between vop.volume
and mnist_training_container.pvolume
in the below code snippet. From the documentation dsl.VolumeOp.add_volume I assume that vop.volume
is kubernetes volume but I am unclear about pvolume
and why is it linked to the training container and what is the difference between them.
vop = dsl.VolumeOp(
name="create_volume",
resource_name="data-volume",
size="500Mi",
modes=dsl.VOLUME_MODE_RWM)
# Create MNIST training component.
# train_op is from func_to_container_op which returns a kfp.dsl.ContainerOp.
# To this container we assign a K8 volume using add_pvolumes.
mnist_training_container = train_op(data_path, model_file) \
.add_pvolumes({data_path: vop.volume})
# Create MNIST prediction component.
mnist_predict_container = predict_op(data_path, model_file, image_number) \
.add_pvolumes({data_path: mnist_training_container.pvolume})
pvolume is a bit of a weird concept which is a bit alien in KFP. The idea was that a volume is being "passed" between components similarly to normal outputs (when actually it's the same volume).
We advice our users to avoid using the pvolume feature and avoid using volumes in the components. Otherwise, the components and pipelines are not portable and have limited usability.
Please check out the samples, tutorials and components. Almost no pipelines use volumes.
Please check the following two tutorials for Python and shell components. Check how the pipelines usually look like. example XGBoost training pipeline.