Search code examples
cassandraapache-atlas

How do I save lineage info in Apache Atlas when using Apache Cassandra and Elasticsearch


I am planning to deploy Apache Atlas using Apache Cassandra as a storage backend and Elasticsearch as an index backend. I am wondering how I can save lineage info with this? It provides get API to get the lineage info but seems to have no way to save it.


Solution

  • In Atlas lineage is created when they are linked through processes using inputs and outputs.

    Example: If you want to see a lineage between two hive_table types it would be like:

    T1(hive_table)--->P1(hive_process)--->T2(hive_table)

    So,basically the entities need to be linked through a process type.

    In Atlas processes are entities and can be created using the API POST: /v2/entity with inputs and outputs defined in them like for above hive_process:

    POST: /api/atlas/v2/entity
        {
          "entity": {
            "typeName": "hive_process",
            "attributes": {
              "outputs": [
                {
                  "guid": "2", 
                  "typeName": "hive_table",
                  "uniqueAttributes": {
                    "qualifiedName": "t2@primary"
                  }
                }
              ],
              "qualifiedName": "p1@primary",
              "inputs": [
                {
                  "guid": "1",
                  "typeName": "hive_table",
                  "uniqueAttributes": {
                    "qualifiedName": "t1@primary"
                  }
                }
              ],
              "name": "P1-Process"
            }
          }
        }
    

    Important thing to note before creating the process is that referenced entities(inputs,outputs) should pre-exists,else process creation will fail.

    If your requirement doesn't consist of pre-existing types you can of course go ahead and define your own types for Atlas Entity and Process

    More about Atlas type system on Apache site