I have a Google Dataflow Batch pipeline which is using Java External Transformations.
It is running fine when I start expansion service using Java -jar and start the pipeline using Dataflow Runner from my local.
How Do I package this as Flex Template,So that It could be run from AirFlow.
I created the Flextemplate with Pythonbase image and set FLEX_TEMPLATE_PYTHON_PY_FILE to my main python file.
How can I start Expansion service in the Docker?
I tried running jar from python as subprocess, I could see in the logs, expansion service has started but pipeline is unable to connect to it.
Got below error :
debug_error_string = "UNKNOWN:failed to connect to all addresses; last error: UNKNOWN: ipv4:127.0.0.1:12345: Failed to connect to remote host: Connection refused {created_time:"2023-11-02T21:14:34.044478242+00:00", grpc_status:14}"
How do I package this multi langiage pipeline as a Flex Template.
Any help is appreciated.
I found that even though Documentation does not show any examples, it mentions about JavaJarExpansionService
.
We can use this class inside our Python code to start expansion service and pass it as an argument to beam.ExternalTransform
while using the External Transformation.
expansion_service = JavaJarExpansionService(path_to_jar=input_args.expansion_jar_path)
beam.ExternalTransform('URN', ImplicitSchemaPayloadBuilder({ Payload }),expansion_service))