Search code examples
google-cloud-data-fusion

How to fix "java.lang.NullPointerException: null" when doing MSSQL to BigQuery in Cloud Data Fusion


I'm working on a Cloud Data Fusion POC, and I'm attempting to create a MSSQL to BigQuery pipeline. The connection works due to the fact that I'm able to import my schema from a query, however, I'm getting a MapReduce Program "phase-1" failed with a java.lang.NullPointerException: null exception

I've already attempted to use the generic database source and sink,as stated in this question Getting Null Pointer Exception when mapping SQL Server Database to MySQL Database with MapReduce The import query is also already specified.

Here's the full stack trace

java.lang.NullPointerException: null
    at com.google.common.base.Preconditions.checkNotNull(Preconditions.java:770) ~[com.google.guava.guava-13.0.1.jar:na]
    at com.google.common.collect.Lists$TransformingSequentialList.<init>(Lists.java:569) ~[com.google.guava.guava-13.0.1.jar:na]
    at com.google.common.collect.Lists.transform(Lists.java:553) ~[com.google.guava.guava-13.0.1.jar:na]
    at com.google.cloud.bigquery.FieldList.fromPb(FieldList.java:116) ~[na:na]
    at com.google.cloud.bigquery.Schema.fromPb(Schema.java:107) ~[na:na]
    at com.google.cloud.bigquery.TableDefinition$Builder.table(TableDefinition.java:120) ~[na:na]
    at com.google.cloud.bigquery.StandardTableDefinition.fromPb(StandardTableDefinition.java:220) ~[na:na]
    at com.google.cloud.bigquery.TableDefinition.fromPb(TableDefinition.java:155) ~[na:na]
    at com.google.cloud.bigquery.TableInfo$BuilderImpl.<init>(TableInfo.java:183) ~[na:na]
    at com.google.cloud.bigquery.Table.fromPb(Table.java:603) ~[na:na]
    at com.google.cloud.bigquery.BigQueryImpl.getTable(BigQueryImpl.java:415) ~[na:na]
    at io.cdap.plugin.gcp.bigquery.util.BigQueryUtil.getBigQueryTable(BigQueryUtil.java:131) ~[na:na]
    at io.cdap.plugin.gcp.bigquery.sink.AbstractBigQuerySink.validateSchema(AbstractBigQuerySink.java:291) ~[na:na]
    at io.cdap.plugin.gcp.bigquery.sink.AbstractBigQuerySink.initOutput(AbstractBigQuerySink.java:119) ~[na:na]
    at io.cdap.plugin.gcp.bigquery.sink.BigQuerySink.prepareRunInternal(BigQuerySink.java:80) ~[na:na]
    at io.cdap.plugin.gcp.bigquery.sink.AbstractBigQuerySink.prepareRun(AbstractBigQuerySink.java:88) ~[na:na]
    at io.cdap.plugin.gcp.bigquery.sink.AbstractBigQuerySink.prepareRun(AbstractBigQuerySink.java:57) ~[na:na]
    at io.cdap.cdap.etl.common.plugin.WrappedBatchSink.lambda$prepareRun$0(WrappedBatchSink.java:52) ~[na:na]
    at io.cdap.cdap.etl.common.plugin.Caller$1.call(Caller.java:30) ~[na:na]
    at io.cdap.cdap.etl.common.plugin.StageLoggingCaller.call(StageLoggingCaller.java:40) ~[na:na]
    at io.cdap.cdap.etl.common.plugin.WrappedBatchSink.prepareRun(WrappedBatchSink.java:51) ~[na:na]
    at io.cdap.cdap.etl.common.plugin.WrappedBatchSink.prepareRun(WrappedBatchSink.java:37) ~[na:na]
    at io.cdap.cdap.etl.common.submit.SubmitterPlugin.lambda$prepareRun$2(SubmitterPlugin.java:71) ~[na:na]
    at io.cdap.cdap.internal.app.runtime.AbstractContext$2.run(AbstractContext.java:551) ~[na:na]
    at io.cdap.cdap.data2.transaction.Transactions$CacheBasedTransactional.finishExecute(Transactions.java:224) ~[na:na]
    at io.cdap.cdap.data2.transaction.Transactions$CacheBasedTransactional.execute(Transactions.java:211) ~[na:na]
    at io.cdap.cdap.internal.app.runtime.AbstractContext.execute(AbstractContext.java:546) ~[na:na]
    at io.cdap.cdap.internal.app.runtime.AbstractContext.execute(AbstractContext.java:534) ~[na:na]
    at io.cdap.cdap.etl.common.submit.SubmitterPlugin.prepareRun(SubmitterPlugin.java:69) ~[na:na]
    at io.cdap.cdap.etl.batch.PipelinePhasePreparer.prepare(PipelinePhasePreparer.java:111) ~[na:na]
    at io.cdap.cdap.etl.batch.mapreduce.MapReducePreparer.prepare(MapReducePreparer.java:97) ~[na:na]
    at io.cdap.cdap.etl.batch.mapreduce.ETLMapReduce.initialize(ETLMapReduce.java:192) ~[na:na]
    at io.cdap.cdap.api.mapreduce.AbstractMapReduce.initialize(AbstractMapReduce.java:109) ~[na:na]
    at io.cdap.cdap.api.mapreduce.AbstractMapReduce.initialize(AbstractMapReduce.java:32) ~[na:na]
    at io.cdap.cdap.internal.app.runtime.batch.MapReduceRuntimeService$1.initialize(MapReduceRuntimeService.java:182) ~[na:na]
    at io.cdap.cdap.internal.app.runtime.batch.MapReduceRuntimeService$1.initialize(MapReduceRuntimeService.java:177) ~[na:na]
    at io.cdap.cdap.internal.app.runtime.AbstractContext.lambda$initializeProgram$1(AbstractContext.java:640) ~[na:na]
    at io.cdap.cdap.internal.app.runtime.AbstractContext.execute(AbstractContext.java:600) ~[na:na]
    at io.cdap.cdap.internal.app.runtime.AbstractContext.initializeProgram(AbstractContext.java:637) ~[na:na]
    at io.cdap.cdap.internal.app.runtime.batch.MapReduceRuntimeService.beforeSubmit(MapReduceRuntimeService.java:547) ~[na:na]
    at io.cdap.cdap.internal.app.runtime.batch.MapReduceRuntimeService.startUp(MapReduceRuntimeService.java:226) ~[na:na]
    at com.google.common.util.concurrent.AbstractExecutionThreadService$1$1.run(AbstractExecutionThreadService.java:47) ~[com.google.guava.guava-13.0.1.jar:na]
    at io.cdap.cdap.internal.app.runtime.batch.MapReduceRuntimeService$2$1.run(MapReduceRuntimeService.java:450) [na:na]
    at java.lang.Thread.run(Thread.java:748) [na:1.8.0_212]

Solution

  • Can you try deleting the target dataset and letting the pipeline auto create it? I suspect the pipeline is trying to write to a dataset with no fields.