I am working on GCP for the first time and I am trying to execute python script on compute engine using bashoperator. I stored my python script in bucket where composer script is located, but when I try to run the bashoperator it throws an error file not found. can I know where should I place my python script so that I can execute that python script on compute engine.
`
bash_task = bash_operator.BashOperator(
task_id='script_execution',
bash_command='gcloud compute ssh --project '+PROJECT_ID+ ' --zone '+REGION+' '+GCE_INSTANCE+ '--command python3 script.py',
dag=dag)
python3: can't open file '/home/airflow/gcs/dags/script.py': [Errno 2] No such file or directory`
Instead of executing a Python
script in a separated Compute Engine
VM instance from Cloud Composer
, you can directly execute a Python code in Composer with PythonOperator
:
from airflow import DAG
from airflow.operators.dummy_operator import DummyOperator
from airflow.operators.python_operator import PythonOperator
from time import sleep
from datetime import datetime
def my_func(*op_args):
print(op_args)
# your Python script and logic here
with DAG('python_dag', description='Python DAG', schedule_interval='*/5 * * * *', start_date=datetime(2018, 11, 1), catchup=False) as dag:
dummy_task = DummyOperator(task_id='dummy_task', retries=3)
python_task = PythonOperator(task_id='python_task', python_callable=my_func, op_args=['one', 'two', 'three'])
dummy_task >> python_task
Use SSHOperator in Cloud Composer, this topic can help :
In this case the script is located in the VM.
You can also think about to rewrite your Python script in Beam and Dataflow Python, if it’s no so complicated to rewrite it. Dataflow has the advantage to be serverless and Airflow proposes built in operators to launch Dataflow jobs.