Search code examples
google-cloud-platformairflowgoogle-compute-enginegoogle-cloud-composer

python script path compute engine for cloud composer bashoperator


I am working on GCP for the first time and I am trying to execute python script on compute engine using bashoperator. I stored my python script in bucket where composer script is located, but when I try to run the bashoperator it throws an error file not found. can I know where should I place my python script so that I can execute that python script on compute engine.

`

 bash_task = bash_operator.BashOperator(
    task_id='script_execution',
    bash_command='gcloud compute ssh --project '+PROJECT_ID+ ' --zone '+REGION+' '+GCE_INSTANCE+ '--command python3 script.py',
    dag=dag)

python3: can't open file '/home/airflow/gcs/dags/script.py': [Errno 2] No such file or directory`


Solution

    • Solution 1:

    Instead of executing a Python script in a separated Compute Engine VM instance from Cloud Composer, you can directly execute a Python code in Composer with PythonOperator :

    from airflow import DAG
    from airflow.operators.dummy_operator import DummyOperator
    from airflow.operators.python_operator import PythonOperator
    from time import sleep
    from datetime import datetime
    
    def my_func(*op_args):
       print(op_args)
       # your Python script and logic here 
    
    with DAG('python_dag', description='Python DAG', schedule_interval='*/5 * * * *', start_date=datetime(2018, 11, 1), catchup=False) as dag:
       dummy_task = DummyOperator(task_id='dummy_task', retries=3)
       python_task = PythonOperator(task_id='python_task', python_callable=my_func, op_args=['one', 'two', 'three'])
    
    dummy_task >> python_task
    
    • Solution 2 :

    Use SSHOperator in Cloud Composer, this topic can help :

    ssh launch script VM Composer

    In this case the script is located in the VM.

    • Solution 3 :

    You can also think about to rewrite your Python script in Beam and Dataflow Python, if it’s no so complicated to rewrite it. Dataflow has the advantage to be serverless and Airflow proposes built in operators to launch Dataflow jobs.