Search code examples
google-cloud-platformapache-beamdataflow

Dataflow- dynamic create disposition Apache Beam


I want to dynamically choose from Create Disposition options depending on the arguments. In the the DataflowPipelineOptions I am accepting load type in a ValueProvider via arguments. However I am not able to get the string from the ValueProvider to decide on what create disposition option to use.

withCreateDisposition(BigQueryIO.Write.CreateDisposition.CREATE_IF_NEEDED)

I want 'CREATE_IF_NEEDED' to be dynamic. I want to replace this with something like this. Note following is just a pseudocode. I am looking for solution here.

create_disp = options.getLoad()
withCreateDisposition(create_disp 

Solution

  • You can pass a program argument representing createDisposition

    Program argument (CREATE_NEVER or CREATE_IF_NEEDED) :

    --bqCreateDisposition=CREATE_NEVER
    

    In the Option class in Java, you can pass a field as Enum (there is a default value in this case with CREATE_IF_NEEDED) :

    import org.apache.beam.sdk.io.gcp.bigquery.BigQueryIO;
    import org.apache.beam.sdk.options.Default;
    import org.apache.beam.sdk.options.Default.Enum;
    import org.apache.beam.sdk.options.Description;
    import org.apache.beam.sdk.options.PipelineOptions;
    
    public interface MyOptions extends PipelineOptions {
    
        @Description("BQ create disposition")
        @Default
        @Enum("CREATE_IF_NEEDED")
        BigQueryIO.Write.CreateDisposition getBqCreateDisposition();
    
        void setBqCreateDisposition(BigQueryIO.Write.CreateDisposition value);
    }