Search code examples
airflowapache-atlas

Apache Atlas and Airflow Integration


I am trying to integrate an Apache Atlas instance I have running with Apache Airflow. Once I set up the connection in airflow.cfg I tried running a DAG from the Airflow scheduler. I get the following error in the log.

[2021-02-02 20:50:47,958] {connectionpool.py:752} WARNING - Retrying (Retry(total=3, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f464b856950>: Failed to establish a new connection: [Errno 111] Connection refused')': /api/atlas/v2/types/typedefs

[2021-02-02 20:50:47,960] {taskinstance.py:1150} ERROR - HTTPConnectionPool(host='localhost', port=21000): Max retries exceeded with url: /api/atlas/v2/types/typedefs (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f464b8650d0>: Failed to establish a new connection: [Errno 111] Connection refused'))

My airflow.cfg is configured as the following:

[lineage]
backend = airflow.lineage.backend.atlas.AtlasBackend

[atlas]
username = <username>
password = <password>
host = localhost
port = 21000

I have tried changing the host to http://localhost as well. I am not sure where to investigate in Atlas to identify why the connection is being refused.


Solution

  • I was able to solve the problem by adding the --hostname flag when starting the docker container for atlas. I then used the hostname I provided as the host in airflow.cfg