Search code examples
google-cloud-platformgoogle-bigqueryapache-beamdataflow

Apache Beam writing status information after BQ writes are done within the dataflow


I am struggling to find a good solution for the writing status of BQ writes right after it is done.

Each dataflow has to process one file, and after no errors occurred, the status should be written to Firestore.

I have a code that looks like this:

PCollection<TableRow> failedInserts = results.getFailedInserts();

    failedInserts
    .apply("Set Global Window",
        Window.<TableRow>into(new GlobalWindows()))
    .apply("Count failures", Count.globally()).apply(ParDo.of(new DoFn<Long, ReportStatusInfo>() {


      @ProcessElement
      public void processElement(final ProcessContext c) throws IOException {
        Long errorNumbers = c.element();
        if (errorNumbers > 1) {
          //set status to failed
        } else if (numberOfErrors == 0) {
        //set status to ok
        }
        insert();
      }
    }))

It does not seem to work correctly as I have the impression that it does not wait for the whole BQ writing process to be finished.

Any other ideas on how to solve my problem in the dataflow or why the above does not work?


Solution

  • The getFailedInserts method is only supported when using Streaming Inserts, as opposed to file loads. In that mode, your code will do what you want