Search code examples
eclipsemavengoogle-cloud-dataflowapache-beamdirect-runner

How to debug Dataflow/Apache Beam pipeline DoFn functions in eclipse using direct runner


I want to run my pipeline using direct runner in eclipse and put a break point in my DoFn functions and debug execution. I tried to setup direct runner with following steps:

  1. Add direct runner maven packageenter image description here
  2. Setup maven profile for direct runner in pom.xml. My pom.xml has this profile

<profiles> <profile> <id>direct-runner</id> <activation> <activeByDefault>true</activeByDefault> </activation> <dependencies> <dependency> <groupId>org.apache.beam</groupId> <artifactId>beam-runners-direct-java</artifactId> <version>0.2.0-incubating</version> </dependency> </dependencies> </profile> </profiles>

  1. I have this maven plugin under plugin management in my pom.xml

<pluginManagement> <plugins> <plugin> <groupId>org.codehaus.mojo</groupId> <artifactId>exec-maven-plugin</artifactId> <version>1.4.0</version> <executions> <execution> <goals> <goal>java</goal> </goals> </execution> </executions> <configuration> <cleanupDaemonThreads>false</cleanupDaemonThreads> <mainClass>com.MyMainClass</mainClass> </configuration> </plugin> </plugins> </pluginManagement>

  1. Below is a screen shot of my eclipse debug configurationenter image description here When I run using above debug configuration job starts in GCP dataflow instead of local JVM threads and my breakpoints are never hit.

Solution

  • Probably is the way how you are creating your pipeline in your test methods. Try to create the pipeline using the TestPipeline util class like this

    public TestPipeline p = TestPipeline.create();