Search code examples
javatestingtagsgoogle-cloud-dataflowjunit4

Apache Beam Test Fails Due to Unknown Output Tag Exception in Java SDK


I'm working on an Apache Beam project using the Java SDK, and I've encountered a persistent issue during testing. My pipeline uses a DoFn that emits elements to a main output and a side output. However, when running my tests, I'm receiving an IllegalArgumentException indicating an "Unknown output tag."

Here's the stack trace that pinpoints the problem:

Caused by: java.lang.IllegalArgumentException: Unknown output tag Tag<com.myproject.SomeClassWithDefinedTags#10>
    at org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument(Preconditions.java:216)
    at org.apache.beam.repackaged.direct_java.runners.core.SimpleDoFnRunner.outputWindowedValue(SimpleDoFnRunner.java:274)
    at org.apache.beam.repackaged.direct_java.runners.core.SimpleDoFnRunner.access$900(SimpleDoFnRunner.java:85)
    at org.apache.beam.repackaged.direct_java.runners.core.SimpleDoFnRunner$DoFnProcessContext.output(SimpleDoFnRunner.java:423)
    at org.apache.beam.sdk.transforms.DoFnOutputReceivers$WindowedContextOutputReceiver.output(DoFnOutputReceivers.java:76)
    at myProject.DoFn.processElement(DoFn.java:136)

I've double-checked that my TupleTags are imported and utilized correctly in the test. Here's a snippet for context:

import static com.myproject.transform.SomeClassWithDefinedTags.SuccessTag;
import static com.myproject.transform.SomeClassWithDefinedTags.FailTag; 

@Test
    @Category(NeedsRunner.class)
    public void testProcessElement() {
        String testInput = "";
        List<String> input = List.of(testInput);
        PCollection<String> output =
                p.apply(Create.of(input))
                        .apply(ParDo.of(new DoFn(SuccessTag, FailTag)));
        PAssert.that(output).containsInAnyOrder("");
        p.run().waitUntilFinish();
    }

The MyDoFn class is structured to output to SuccessTag and FailTag tags accordingly. Despite ensuring that the tags are consistent, the test fails with the above exception. The DoFn is also used in the main pipeline without any issues.

Could someone help me understand what might be causing this "Unknown output tag" exception in the test environment and how to resolve it?


Solution

  • There are two places to start checking. First, be sure to add .withOutputTags() to your ParDo.

    Second, its possible this is being run on different runners in the test versus the main pipeline, which can cause inconsistent behavior.