Search code examples
javagoogle-cloud-platformgoogle-cloud-dataflowapache-beamgoogle-cloud-bigtable

ValueProvider<String> is not getting accepted by the pipeline for BigTable's InstanceID and TableID


I'm trying to write to BigTable through a generic Dataflow code. By generic I mean it must be able to write to any BigTable table provided as a parameter at runtime, using a ValueProvider. The code is not showing any errors but when I try to create a template of the code, I can see below error message:

Exception in thread "main" java.lang.IllegalStateException: Value only available at runtime, but accessed from a non-runtime context: RuntimeValueProvider{propertyName=bigTableInstanceId, default=null}

It is weird as the functionality to give ValueProviders is supported for this.

Below is the code I am using to write to BigTable:

results.get(btSuccessTag).apply("Write to BigTable",
                    CloudBigtableIO.writeToTable(new CloudBigtableTableConfiguration.Builder()
                            .withProjectId(options.getProject())
                            .withInstanceId(options.getBigTableInstanceId())
                            .withTableId(options.getBigTableTable())
                            .build()));

The interface defining the ValueProviders is:

public interface BTPipelineOptions extends DataflowPipelineOptions{
    @Required
    @Description("BigTable Instance Id")
    ValueProvider<String> getBigTableInstanceId();
    void setBigTableInstanceId(ValueProvider<String> bigTableInstanceId);

    @Required
    @Description("BigTable Table Destination")
    ValueProvider<String> getBigTableTable();
    void setBigTableTable(ValueProvider<String> bigTableTable);

    @Required
    @Description("BT error file path")
    ValueProvider<String> getBTErrorFilePath();
    void setBTErrorFilePath(ValueProvider<String> btErrorFilePath);
}

Please let me know if I'm missing something here.


Solution

  • Unfortunately, it seems that the CloudBigtableIO parameters are not updated to be modified by templates via a ValueProvider. Though BigtableIO is compatible with ValueProviders.

    In order for Dataflow templates to be able to modify a parameter when launched from template, the library transforms (i.e. the source and sinks) it uses must be first updated to user ValueProviders for the parameters all the way into the library code, when the parameter is used. See more details about ValueProvider here.

    However, we have example template pipelines which work with BigtableIO instead of CloudBigtableIO. See AvroToBigtable. So I think that you have a few options

    1. Update your custom pipeline, using one of the Bigtable template examples as an example to follow. Be sure to use BigtableIO instead of CloudBigtableIO
    2. Update CloudBigtableIO to use ValueProviders all the way through, until the parameter is used. See creating_templates, and an example of proper ValueProvider usage in BigtableIO. Contribute it to apache beam's github, or extend/modify the class locally.
    3. See if the existing Bigtable template pipelines fit your needs. You can launch them from the Dataflow UI.

    I hope this works for you. Let me know if I explained this well. Or if I overlooked something.