Search code examples
pythonparametersgoogle-cloud-platformgoogle-cloud-dataflowapache-beam

Ways of using value provider parameter in Python Apache Beam


Right now I'm just able to grab the RunTime value inside a class using a ParDo, is there another way to get to use the runtime parameter like in my functions?

This is the code I got right now:

class UserOptions(PipelineOptions):
    @classmethod
    def _add_argparse_args(cls, parser):
        parser.add_value_provider_argument('--firestore_document',default='')

def run(argv=None):

    parser = argparse.ArgumentParser()

    pipeline_options = PipelineOptions()

    user_options = pipeline_options.view_as(UserOptions)

    pipeline_options.view_as(SetupOptions).save_main_session = True

    with beam.Pipeline(options=pipeline_options) as p:

        rows = (p 
        | 'Create inputs' >> beam.Create(['']) 
        | 'Call Firestore' >> beam.ParDo(
                CallFirestore(user_options.firestore_document)) 
        | 'Read DB2' >> beam.Map(ReadDB2))

I want the user_options.firestore_document to be usable in other functions without having to do a ParDo


Solution

  • The only way in which you can use value providers are in ParDos, and Combines. It is not possible to pass a value provider in a create, but you can define a DoFn that returns the value provider you pass to it in the constructor:

    class OutputValueProviderFn(beam.DoFn):
      def __init__(self, vp):
        self.vp = vp
    
      def process(self, unused_elm):
        yield self.vp.get()
    

    And in your pipeline, you would do the following:

    user_options = pipeline_options.view_as(UserOptions)
    
    with beam.Pipeline(options=pipeline_options) as p:
      my_value_provided_pcoll = (
          p
          | beam.Create([None])
          | beam.ParDo(OutputValueProviderFn(user_options.firestore_document))
    

    That way you wouldn't use it in a Create, as it's not possible, but you could still get it in a PCollection.