Search code examples
kubernetesairflowkubernetes-pod

Airflow KubernetesPodOperator Losing Connection to Worker Pod


Experiencing an odd issue with KubernetesPodOperator on Airflow 1.1.14.

Essentially for some jobs Airflow is losing contact with the pod it creates.

[2021-02-10 07:30:13,657] {taskinstance.py:1150} ERROR - ('Connection broken: IncompleteRead(0 bytes read)', IncompleteRead(0 bytes read))

When I check logs in kubernetes with kubectl logs I can see that the job carried on past the connection broken error.

The connection broken error seems to happen exactly 1 hour after the last logs that Airflow pulls from the pod (we do have a 1 hour config on connections), but the pod keeps running happily in the background.

I've seen this behaviour repeatedly, and it tends to happen with longer running jobs with a gap in the log output, but I have no other leads. Happy to update the question if certain specifics are misssing.


Solution

  • As I have mentioned in comments section I think you can try to set operators get_logs parameter to False - default value is True .

    Take a look: airflow-connection-broken, airflow-connection-issue .