python airflow samba google-cloud-composer

Airflow sigkill tasks on cloud composer

I've seen mention of sigkill occurring for others but I think my use case is slightly different.

I'm using the managed airflow service through gcp cloud composer, running airflow 2. I have 3 worker nodes all set to the default instance creation settings.

The environment runs dags fairly smoothly for the most part (api calls, moving files from on prem) however it would seem as though its having a terribly hard time executing a couple of slightly larger jobs.

One of these jobs uses a samba connector to incrementally backfill missing data and store on gcs. The other is a salesforce api connector.

These jobs run locally with absolutely no issue so I'm wondering why I'm encountering these issues. There should be plenty memory to run these tasks as a cluster, although scaling up my cluster for just 2 jobs doesn't seem like it's particularly efficient.

I have tried both dag and task timeouts. I've tried increasing the connection timeout on the samba client.

So could someone please share some insight into how I can get airflow to execute these tasks without killing the session - even if it does take longer.

Happy to add more detail if required but I don't have the available data in front of me currently to share.

Solution

Frustratingly, increasing resources meant the jobs could run. I don't know why the resources weren't enough as they really should've been. But optimisation for fully managed solutions isn't overly straight forward other than adding cost.