Search code examples
javaapache-beam

Java Apache Beam ProcessElement Method have to be void?


In Java Apache Beam , does the @ProcessElement method required to be void? Or can it return an int, string or class?

We are doing unit tests, and want to validate the output of methods. I know there are mockito and spy functions. However, prefer to use simple input and output methods. Just wondering if there are negative consequences of not using void ?

We don't have any actual outputs in our method (which are sent to a next streaming phase), example our method saves internally to a database.

@ProcessElement
public void processData(ProcessContext context, @Element KV<String, TestItem> input) {
   String outputText;
   ...
   ...
   return outputText;

Solution

  • Yes, it has to return void, see the docs here: https://beam.apache.org/releases/javadoc/2.0.0/org/apache/beam/sdk/transforms/DoFn.ProcessElement.html

    The signature of this method must satisfy the following constraints:

    • Its first argument must be a DoFn.ProcessContext.
    • If one of its arguments is a subtype of RestrictionTracker, then it is a splittable DoFn subject to the separate requirements described below. Items below are assuming this is not a splittable DoFn.
    • If one of its arguments is a subtype of BoundedWindow then it will be passed the window of the current element. When applied by ParDo the subtype of BoundedWindow must match the type of windows on the input PCollection. If the window is not accessed a runner may perform additional optimizations.
    • It must return void.