As a part of a DAG, I am triggering gcp pyspark dataproc job using below code,
dag=dag,
gcp_conn_id=gcp_conn_id,
region=region,
main=pyspark_script_location_gcs,
task_id='pyspark_job_1_submit',
cluster_name=cluster_name,
job_name="job_1"
)
How can I pass a variable as parameter to pyspark job that can be accessible in script ?
You can use the paramter arguments
of DataProcPySparkOperator:
arguments (list) – Arguments for the job. (templated)
job = DataProcPySparkOperator(
gcp_conn_id=gcp_conn_id,
region=region,
main=pyspark_script_location_gcs,
task_id='pyspark_job_1_submit',
cluster_name=cluster_name,
job_name="job_1",
arguments=[
"-arg1=arg1_value", # or just "arg1_value" for non named args
"-arg2=arg2_value"
],
dag=dag
)