Search code examples

Airflow task getting failed with status Task exited with return code Negsignal.SIGKILL

We are running a JDBCOperator task to refresh the metastore in impala. The task fails with the return code Negsignal.SIGKILL . Below is the logs from airflow UI :

[2021-09-08 16:47:32,659] {} INFO - Running <TaskInstance: XXX.XXX-refresh-impala 2021-09-08T12:22:00+00:00 [running]> on host boblivefjanonsalesorderaddressboblivefjanonsalesorderaddressref
[2021-09-08 16:47:32,746] {} INFO - Executing: ['SET MEM_LIMIT=400000000;', 'SET SYNC_DDL=1;', 'INVALIDATE METADATA XXX.XXXX;', 'COMPUTE STATS XXX.XXXX;']
[2021-09-08 16:47:32,749] {} INFO - Using connection to: id: dwh_impala. Host: jdbc:impala://cloudera-impala-proxy.XXX.XX.XXXX.XX:XXX/;AuthMech=3;ssl=1, Port: None, Schema: , Login: XXX-XXXXX-XXX, Password: XXXXXXXX, extra: XXXXXXXX
[2021-09-08 16:47:36,992] {} INFO - Task exited with return code Negsignal.SIGKILL

If I'm checking the logs of the airflow-scheduler, the task is completed successful. Logs are :

<TaskInstance: XXX_status-refresh-impala 2021-09-07 04:14:00+00:00 [scheduled]>
[2021-09-08 18:22:37,740] {} INFO - Setting the following 1 tasks to queued state:
    <TaskInstance: XXX_status-refresh-impala 2021-09-07 04:14:00+00:00 [queued]>
[2021-09-08 18:22:37,741] {} INFO - Sending ('XXX', 'XXX-refresh-impala', datetime.datetime(2021, 9, 7, 4, 14, tzinfo=<TimezoneInfo [UTC, GMT, +00:00:00, STD]>), 5) to executor with priority 1 and queue celery
[2021-09-08 18:22:37,741] {} INFO - Adding to queue: ['airflow', 'run', 'XXX', 'XXX-refresh-impala', '2021-09-07T04:14:00+00:00', '--local', '--pool', 'default_pool', '-sd', '/usr/local/airflow/dags/']
[2021-09-08 18:22:37,741] {} INFO - Add task ('XXX', 'XXX-refresh-impala', datetime.datetime(2021, 9, 7, 4, 14, tzinfo=<TimezoneInfo [UTC, GMT, +00:00:00, STD]>), 5) with command ['airflow', 'run', 'XXX', 'XXX-refresh-impala', '2021-09-07T04:14:00+00:00', '--local', '--pool', 'default_pool', '-sd', '/usr/local/airflow/dags/'] with executor_config {}
[2021-09-08 18:22:37,742] {} INFO - Kubernetes job is (('XXX', 'XXX-refresh-impala', datetime.datetime(2021, 9, 7, 4, 14, tzinfo=<TimezoneInfo [UTC, GMT, +00:00:00, STD]>), 5), ['airflow', 'run', 'XXX', 'XXX-refresh-impala', '2021-09-07T04:14:00+00:00', '--local', '--pool', 'default_pool', '-sd', '/usr/local/airflow/dags/'], None)
[2021-09-08 18:22:37,821] {} INFO - Event: XXX-bdbca784bcf240958282fff8e93eeb4c had an event of type ADDED
[2021-09-08 18:22:37,821] {} INFO - Event: XXX-bdbca784bcf240958282fff8e93eeb4c Pending
[2021-09-08 18:22:37,830] {} INFO - Event: XXX-bdbca784bcf240958282fff8e93eeb4c had an event of type MODIFIED
[2021-09-08 18:22:37,830] {} INFO - Event: XXX-bdbca784bcf240958282fff8e93eeb4c Pending
[2021-09-08 18:22:37,852] {} INFO - Event: XXX-bdbca784bcf240958282fff8e93eeb4c had an event of type MODIFIED
[2021-09-08 18:22:37,852] {} INFO - Event: XXX-bdbca784bcf240958282fff8e93eeb4c Pending
[2021-09-08 18:22:39,796] {} INFO - Event: XXX-bdbca784bcf240958282fff8e93eeb4c had an event of type MODIFIED
[2021-09-08 18:22:39,797] {} INFO - Event: XXX-bdbca784bcf240958282fff8e93eeb4c is Running
[2021-09-08 18:23:15,889] {} INFO - Event: XXX-bdbca784bcf240958282fff8e93eeb4c had an event of type MODIFIED
[2021-09-08 18:23:15,890] {} INFO - Event: XXX-bdbca784bcf240958282fff8e93eeb4c Succeeded
[2021-09-08 18:23:17,737] {} INFO - Attempting to finish pod; pod_id: XXX-bdbca784bcf240958282fff8e93eeb4c; state: None; labels: {'airflow-worker': '95afa541-026c-48a1-9431-526e21b4d4e3', 'airflow_version': '1.10.15', 'app': 'data-anonymization-lite-airflow', 'component': 'worker', 'dag_id': 'XXX', 'execution_date': '2021-09-07T04_14_00_plus_00_00', 'executor': 'True', 'kubernetes_executor': 'True', 'platform': '', 'pod_operator': 'False', 'release': 'data-anonymization-lite-airflow', 'task_id': 'XXX-refresh-impala', 'tier': 'airflow', 'tribe': 'data', 'try_number': '5', 'workspace': ''}
[2021-09-08 18:23:17,740] {} INFO - Found matching task XXX-XXX-refresh-impala (2021-09-07 04:14:00+00:00) with current state of running
[2021-09-08 18:23:17,741] {} INFO - Changing state of (('XXX', 'XXX-refresh-impala', datetime.datetime(2021, 9, 7, 4, 14, tzinfo=tzlocal()), 5), None, 'XXX-bdbca784bcf240958282fff8e93eeb4c', 'data-anonymization-lite', '406836844') to None
[2021-09-08 18:23:17,760] {} INFO - Event: XXX-bdbca784bcf240958282fff8e93eeb4c had an event of type MODIFIED
[2021-09-08 18:23:17,760] {} INFO - Event: XXX-bdbca784bcf240958282fff8e93eeb4c Succeeded
[2021-09-08 18:23:17,764] {} INFO - Deleted pod: ('XXX', 'XXX-refresh-impala', datetime.datetime(2021, 9, 7, 4, 14, tzinfo=tzlocal()), 5) in namespace data-anonymization-lite
[2021-09-08 18:23:17,767] {} INFO - Event: XXX-bdbca784bcf240958282fff8e93eeb4c had an event of type DELETED
[2021-09-08 18:23:17,767] {} INFO - Event: XXX-bdbca784bcf240958282fff8e93eeb4c Succeeded
[2021-09-08 18:23:19,739] {} INFO - Attempting to finish pod; pod_id: XXX-bdbca784bcf240958282fff8e93eeb4c; state: None; labels: {'airflow-worker': '95afa541-026c-48a1-9431-526e21b4d4e3', 'airflow_version': '1.10.15', 'app': 'data-anonymization-lite-airflow', 'component': 'worker', 'dag_id': 'XXX', 'execution_date': '2021-09-07T04_14_00_plus_00_00', 'executor': 'True', 'kubernetes_executor': 'True', 'platform': '', 'pod_operator': 'False', 'release': 'data-anonymization-lite-airflow', 'task_id': 'XXX-refresh-impala', 'tier': 'airflow', 'tribe': 'data', 'try_number': '5', 'workspace': ''}
[2021-09-08 18:23:19,742] {} INFO - Found matching task XXX-XXX-refresh-impala (2021-09-07 04:14:00+00:00) with current state of running
[2021-09-08 18:23:19,743] {} INFO - Attempting to finish pod; pod_id: XXX-bdbca784bcf240958282fff8e93eeb4c; state: None; labels: {'airflow-worker': '95afa541-026c-48a1-9431-526e21b4d4e3', 'airflow_version': '1.10.15', 'app': 'data-anonymization-lite-airflow', 'component': 'worker', 'dag_id': 'XXX', 'execution_date': '2021-09-07T04_14_00_plus_00_00', 'executor': 'True', 'kubernetes_executor': 'True', 'platform': '', 'pod_operator': 'False', 'release': 'data-anonymization-lite-airflow', 'task_id': 'XXX-refresh-impala', 'tier': 'airflow', 'tribe': 'data', 'try_number': '5', 'workspace': ''}
[2021-09-08 18:23:19,746] {} INFO - Found matching task XXX-XXX-refresh-impala (2021-09-07 04:14:00+00:00) with current state of running
[2021-09-08 18:23:19,747] {} INFO - Changing state of (('XXX', 'XXX-refresh-impala', datetime.datetime(2021, 9, 7, 4, 14, tzinfo=tzlocal()), 5), None, 'XXX-bdbca784bcf240958282fff8e93eeb4c', 'data-anonymization-lite', '406836859') to None
[2021-09-08 18:23:19,753] {} INFO - Deleted pod: ('XXX', 'XXX-refresh-impala', datetime.datetime(2021, 9, 7, 4, 14, tzinfo=tzlocal()), 5) in namespace data-anonymization-lite
[2021-09-08 18:23:19,753] {} INFO - Changing state of (('XXX', 'XXX-refresh-impala', datetime.datetime(2021, 9, 7, 4, 14, tzinfo=tzlocal()), 5), None, 'XXX-bdbca784bcf240958282fff8e93eeb4c', 'data-anonymization-lite', '406836860') to None
[2021-09-08 18:23:19,758] {} INFO - Deleted pod: ('XXX', 'XXX-refresh-impala', datetime.datetime(2021, 9, 7, 4, 14, tzinfo=tzlocal()), 5) in namespace data-anonymization-lite
[2021-09-08 18:23:21,766] {} INFO - Executor reports execution of XXX_status-refresh-impala execution_date=2021-09-07 04:14:00+00:00 exited with status None for try_number 5  

We are airflow 1.10.15 on kubernetes.

Any help or guidance would be really appreciated. Thanks


  • The issue seemed to be with the amount of resources allocated to the worker pods. The task was memory intensive and increasing the memory for worker pods worked in this case.