Search code examples
apache-beamspring-cloud-dataflow

Apache Beam: PTransform vs PValue


Given PTransform<PCollection<X>, PCollection<Y>> for arbitrary type X and Y. What exactly is transform and what exactly is PValue for this example? Is PValue defining a last vertex in a graph?


Solution

  • PValue is a common base class for various things that can be inputs and outputs of a PTransform. PCollection is the most common example; other examples are: the trivial PBegin and PDone, PCollectionTuple (a transform can return multiple PCollections - as ParDo.withOutputTags does), and it's possible to define custom PValue's (though it's very rarely needed unless you're a library author), e.g. see here.