I'm currently working on a data processing pipeline using Apache Beam and I'm interested in best practices for reporting metrics in a specific PTransform. In this case, the PTransform will extract a metric from the input PCollection and output the same value of the input.
I'm considering two options: either extracting and reporting the metric in the next step of the pipeline, or creating a separate PTransform for it. However, I'm concerned about the impact of creating a PTransform on the overall performance of the pipeline.
Adjacent transforms are fused which generally make the overhead of putting something in the same PTransform or a subsequent one negligable.