Search code examples
javajdbcapache-beamapache-beam-io

cloud dataflow cloud sql dataflow runner giving null pointer exception


I'm trying to process considerable number of records using cloud dataflow. My source is google cloud storage and my sink is cloud SQL(MySQL). I have the following code to write to the sink(Cloud SQL).

p.apply()
....
.withDataSourceConfiguration(JdbcIO.DataSourceConfiguration.create(
                                "com.mysql.cj.jdbc.Driver", "jdbc:mysql://google/<DBNAME>?cloudSqlInstance=<INSTANCE_NAME>&socketFactory=com.google.cloud.sql.mysql.SocketFactory&user=<USERNAME>&password=<PASSWORD>&useSSL=false"
                            )
                        )

The above works fine when I run the pipeline using DirectRunner. But it throws a NullPointer Exception when run on a DataflowRunner. The exception is as follows:

java.lang.NullPointerException
    at org.apache.beam.sdk.io.jdbc.JdbcIO$Write$WriteFn.executeBatch(JdbcIO.java:775)
    at org.apache.beam.sdk.io.jdbc.JdbcIO$Write$WriteFn.finishBundle(JdbcIO.java:755)

Beam Version = 2.16.0, 2.15.0 - tried both versions but failed. Any reason why this happens? What's the solution to make it work with DataflowRunner?


Solution

  • I have resolved this. The NUllPointerException is for another reason in the finishBundle