Search code examples
google-cloud-platformgoogle-cloud-dataflowapache-beam

How to Deploy Multi language Google Dataflow pipeline using Flex template?


I have a Google Dataflow Batch pipeline which is using Java External Transformations.

It is running fine when I start expansion service using Java -jar and start the pipeline using Dataflow Runner from my local.

How Do I package this as Flex Template,So that It could be run from AirFlow.

I created the Flextemplate with Pythonbase image and set FLEX_TEMPLATE_PYTHON_PY_FILE to my main python file.

How can I start Expansion service in the Docker?

I tried running jar from python as subprocess, I could see in the logs, expansion service has started but pipeline is unable to connect to it.

Got below error :

debug_error_string = "UNKNOWN:failed to connect to all addresses; last error: UNKNOWN: ipv4:127.0.0.1:12345: Failed to connect to remote host: Connection refused {created_time:"2023-11-02T21:14:34.044478242+00:00", grpc_status:14}"

How do I package this multi langiage pipeline as a Flex Template.

Any help is appreciated.


Solution

  • I found that even though Documentation does not show any examples, it mentions about JavaJarExpansionService.

    We can use this class inside our Python code to start expansion service and pass it as an argument to beam.ExternalTransform while using the External Transformation.

    expansion_service = JavaJarExpansionService(path_to_jar=input_args.expansion_jar_path)
    
    beam.ExternalTransform('URN', ImplicitSchemaPayloadBuilder({ Payload }),expansion_service))