Search code examples
amazon-web-servicesairflowapache-atlas

Apache Airflow and Apache Atlas Timeout


I am running Apache Airflow in AWS ECS and I am running Apache Atlas on EC2. I have been able to connect a local instance of Apache Airflow to Apache Atlas on EC2; however, I am not able to connect my AWS ECS instance and EC2 instance. I get the following error when Airflow task in DAG is trying to push information to Apache Atlas.

[2021-02-18 18:49:37,301] {connectionpool.py:752} WARNING - Retrying (Retry(total=4, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x7fb1e2e87410>, 'Connection to <ip-address> timed out. (connect timeout=10)')': /api/atlas/v2/types/typedefs
[2021-02-18 18:49:47,302] {connectionpool.py:752} WARNING - Retrying (Retry(total=3, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x7fb1e2e87b10>, 'Connection to <ip-address> timed out. (connect timeout=10)')': /api/atlas/v2/types/typedefs
[2021-02-18 18:49:57,311] {connectionpool.py:752} WARNING - Retrying (Retry(total=2, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x7fb1e2e9f190>, 'Connection to <ip-address> timed out. (connect timeout=10)')': /api/atlas/v2/types/typedefs
[2021-02-18 18:50:07,319] {connectionpool.py:752} WARNING - Retrying (Retry(total=1, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x7fb1e2e9f7d0>, 'Connection to <ip-address> timed out. (connect timeout=10)')': /api/atlas/v2/types/typedefs
[2021-02-18 18:50:17,327] {connectionpool.py:752} WARNING - Retrying (Retry(total=0, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x7fb1e2e9fe10>, 'Connection to <ip-address> timed out. (connect timeout=10)')': /api/atlas/v2/types/typedefs
[2021-02-18 18:50:27,338] {taskinstance.py:1150} ERROR - HTTPConnectionPool(host='<ip-address>, port=21000): Max retries exceeded with url: /api/atlas/v2/types/typedefs (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x7fb1e2ea3490>, 'Connection to <ip-address> timed out. (connect timeout=10)'))

Edit: Posted code as requested

airflow.cfg configuration

backend = airflow.lineage.backend.atlas.AtlasBackend

[atlas]
host = <ip-address>
port = 21000
username = admin
password = <password>

Solution

  • I was able to solve the problem by setting the ip address as the private ip address instead of the public ip address of the ec2 on which atlas was running on. In addition, I had to update the security group inbound rules for the ec2 running apache atlas to allow the private ip address of airflow webserver traffic to enter.