Search code examples
apache-beam

report metrics in a specific PTransform can cause performance issues?


I'm currently working on a data processing pipeline using Apache Beam and I'm interested in best practices for reporting metrics in a specific PTransform. In this case, the PTransform will extract a metric from the input PCollection and output the same value of the input.

I'm considering two options: either extracting and reporting the metric in the next step of the pipeline, or creating a separate PTransform for it. However, I'm concerned about the impact of creating a PTransform on the overall performance of the pipeline.


Solution

  • Adjacent transforms are fused which generally make the overhead of putting something in the same PTransform or a subsequent one negligable.