Search code examples
hdpapache-atlas

Simple example for adding relationships between Atlas entities?


What is the correct way to use the REST API to add a relationship between entities in apache atlas? Looking at the docs for the REST API, I find it difficult to tell what some of the fields mean, which are required or not (and what happens if they are not entered), or what default values should be (since the examples use what appear to be placeholder values (eg. what is provenanceType or the propagateTags field appears to expect some kind of enumeration value, but never specifies valid options)).

Could someone provide any example of what this would look like in real / valid values? Eg. if had 2 entities E1 and E2 already added to Atlas and wanted to establish a relationship between the two, would want to do something like...

curl -X POST --header 'Content-Type: application/json;charset=UTF-8' --header 'Accept: application/json' -d '{<simplified json>}' 'https://atlas-server-hostname:21000/v2/relationship'

Trying

[hph_etl@HW03 ~]$ curl -vv -u admin:admin -X POST --header 'Content-Type: application/json;charset=UTF-8' --header 'Accept: application/json' -d '{ \
   "createTime": 1565135406, \
   "createdBy": "hph_etl", \
   "end1": { \
     "guid": "2ddcda5b-2489-4636-a9ab-12b199c02422", \
     "typeName": "hdfs_path" \
   }, \
   "end2": { \
     "guid": "a33f45de-13d0-4a30-9df7-b0e02eb0dfd5", \
     "typeName": "hdfs_path" \
   }, \
   "guid": "2ddcda5b-2489-4636-a9ab-12b199c02422", \
   "propagateTags": "TWO_TO_ONE", \
   "status": "ACTIVE", \
   "typeName": "hdfs_path" \
 }' 'http://HW03.co.local:21000/api/atlas/v2/relationship'


* About to connect() to HW03.co.local port 21000 (#0)
*   Trying 172.18.4.48...
* Connected to HW03.co.local (172.18.4.48) port 21000 (#0)
* Server auth using Basic with user 'admin'
> POST /v2/relationship HTTP/1.1
> Authorization: Basic xxxxxx
> User-Agent: curl/7.29.0
> Host: HW03.co.local:21000
> Content-Type: application/json;charset=UTF-8
> Accept: application/json
> Content-Length: 442
>
* upload completely sent off: 442 out of 442 bytes
< HTTP/1.1 404 Not Found
< Date: Wed, 07 Aug 2019 01:07:44 GMT
< Set-Cookie: ATLASSESSIONID=xxxxxx;Path=/;HttpOnly
< X-Frame-Options: DENY
< X-Content-Type-Options: nosniff
< X-XSS-Protection: 1; mode=block
< Strict-Transport-Security: max-age=31536000; includeSubDomains
< Content-Type: text/html;charset=utf-8
< Content-Length: 2265
< Server: Jetty(9.3.14.v20161028)
<
<!doctype html>
<!--
....
*
*     http://www.apache.org/licenses/LICENSE-2.0
....
-->
<!--[if gt IE 8]>
<script type="text/javascript">
function Redirect() {
window.location.assign("login.jsp");
}
Redirect();
</script>
<![endif]-->
<!-- HTML5 shim and Respond.js for IE8 support of HTML5 elements and media queries -->
<!--[if gt IE 7]>
<script src="js/external_lib/es5-shim.min.js"></script>
<script src="js/external_lib/respond.min.js"></script>
<![endif]-->
<html lang="en">

<head>
    <meta charset="utf-8">
    <meta http-equiv="X-UA-Compatible" content="IE=edge">
    <meta name="viewport" content="width=device-width, initial-scale=1">
    <title>Atlas</title>
    ....
</head>

<body>
<div id="wrapper">
    <!-- Page Content Begin -->
    <div id="page-content-wrapper">
        <div class="page-title clearfix">
            <h3>Looking for something?</h3>
            <p>We're sorry. The web address you're looking for is not a functioning page in Apache Atlas. Please try navigating from <a href="index.html">Apache Atlas Home</a></p>
        </div>
    </div>
    <!-- Page Content End -->
</div>
</body>

</html>
* Connection #0 to host HW03.co.local left intact

does not work and not sure what to do about the error message. Even the simple example of

curl -vv -u admin:admin -X POST --header 'Content-Type: application/json;charset=UTF-8' --header 'Accept: application/json' -d '{ \
   "end1": { \
     "guid": "2ddcda5b-2489-4636-a9ab-12b199c02422" \
   }, \
   "end2": { \
     "guid": "a33f45de-13d0-4a30-9df7-b0e02eb0dfd5" \
   }, \
   "typeName": "AtlasRelationshipDef" \
 }' 'http://HW03.ucera.local:21000/api/atlas/v2/relationship'

throws a similarly uninformative error

* upload completely sent off: 211 out of 211 bytes
< HTTP/1.1 500 Internal Server Error
< Date: Mon, 12 Aug 2019 19:57:44 GMT
< Set-Cookie: ATLASSESSIONID=xxxxxx;Path=/;HttpOnly
< Expires: Thu, 01 Jan 1970 00:00:00 GMT
< X-Frame-Options: DENY
< X-Content-Type-Options: nosniff
< X-XSS-Protection: 1; mode=block
< Strict-Transport-Security: max-age=31536000; includeSubDomains
< Content-Type: text/plain
< Transfer-Encoding: chunked
< Server: Jetty(9.3.14.v20161028)
<
* Connection #0 to host HW03.co.local left intact
There was an error processing your request. It has been logged (ID 6d64bc3a1a910e46)

Checking the logs in /var/logs/atlas/application.log on the atlas host server, I can see yet another uninformative error message...

[hph_etl@HW03 atlas]$ cat application.log | grep -C 2 6d64bc3a1a910e46
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
2019-08-12 09:57:44,880 ERROR - [pool-2-thread-10 - 8a5535b1-6544-4f9b-b3ad-8bec5e8d6fcd:] ~ Error handling a request: 6d64bc3a1a910e46 (ExceptionMapperUtil:32)
javax.ws.rs.WebApplicationException
        at com.sun.jersey.server.impl.uri.rules.TerminatingRule.accept(TerminatingRule.java:66)

Note that for getting the guids required for the relationship linking, even Hortonworks seems to provide only a poor solution.

What could be going wrong here? Are there any better docs than those linked to for understanding the API?


Solution

  • Atlas relationship between existing entities can be created either using entity GUIDs or uniqueAttributes in end1 and end2 which can be qualifiedName or any other unique attribute .

    Please do note that top level typeName is the relationship def typeName while typeName inside end1 and end2 is entity typeName.

    In case of relationship between hive_table and hive_db the relationship def typeName is: hive_table_db

    So, if you want to create a relationship between hive_table and hive_db, the request would be:

     POST: /api/atlas/v2/relationship
            {
            "typeName": "hive_table_db",
            "end1": {
                "typeName": "hive_table",
                "uniqueAttributes": {
                    "qualifiedName": "db.table@cluster"
                }
            },
            "end2": {
                "typeName": "hive_db",
                "uniqueAttributes": {
                    "qualifiedName": "db@cluster"
                }
            }
        }
    

    For predefined Atlas types you can find the relationship typeName from its definition inside relationshipAttributeDefs field

    GET: /api/atlas/v2/types/typedef/name/hive_db
    

    Which gives the following response:

    {
        "category": "ENTITY",
        "guid": "78c44290-2ed8-461b-953d-3965d9bb44ca",
        "createdBy": "root",
        "updatedBy": "root",
        "createTime": 1548175553859,
        "updateTime": 1548175822249,
        "version": 2,
        "name": "hive_db",
        "description": "hive_db",
        "typeVersion": "1.2",
        "serviceType": "hive",
        "attributeDefs": [
            {
                "name": "clusterName",
                "typeName": "string",
                "isOptional": false,
                "cardinality": "SINGLE",
                "valuesMinCount": 1,
                "valuesMaxCount": 1,
                "isUnique": false,
                "isIndexable": true,
                "includeInNotification": true,
                "searchWeight": -1
            },
            {
                "name": "location",
                "typeName": "string",
                "isOptional": true,
                "cardinality": "SINGLE",
                "valuesMinCount": 0,
                "valuesMaxCount": 1,
                "isUnique": false,
                "isIndexable": false,
                "includeInNotification": false,
                "searchWeight": -1
            },
            {
                "name": "parameters",
                "typeName": "map<string,string>",
                "isOptional": true,
                "cardinality": "SINGLE",
                "valuesMinCount": 0,
                "valuesMaxCount": 1,
                "isUnique": false,
                "isIndexable": false,
                "includeInNotification": false,
                "searchWeight": -1
            },
            {
                "name": "ownerType",
                "typeName": "hive_principal_type",
                "isOptional": true,
                "cardinality": "SINGLE",
                "valuesMinCount": 0,
                "valuesMaxCount": 1,
                "isUnique": false,
                "isIndexable": false,
                "includeInNotification": false,
                "searchWeight": -1
            }
        ],
        "superTypes": [
            "Asset"
        ],
        "subTypes": [],
        "relationshipAttributeDefs": [
            {
                "name": "tables",
                "typeName": "array<hive_table>",
                "isOptional": true,
                "cardinality": "SET",
                "valuesMinCount": -1,
                "valuesMaxCount": -1,
                "isUnique": false,
                "isIndexable": false,
                "includeInNotification": false,
                "searchWeight": -1,
                "constraints": [
                    {
                        "type": "ownedRef"
                    }
                ],
                "relationshipTypeName": "hive_table_db",
                "isLegacyAttribute": false
            },
            {
                "name": "ddlQueries",
                "typeName": "array<hive_db_ddl>",
                "isOptional": true,
                "cardinality": "SET",
                "valuesMinCount": -1,
                "valuesMaxCount": -1,
                "isUnique": false,
                "isIndexable": false,
                "includeInNotification": false,
                "searchWeight": -1,
                "constraints": [
                    {
                        "type": "ownedRef"
                    }
                ],
                "relationshipTypeName": "hive_db_ddl_queries",
                "isLegacyAttribute": false
            },
            {
                "name": "meanings",
                "typeName": "array<AtlasGlossaryTerm>",
                "isOptional": true,
                "cardinality": "SET",
                "valuesMinCount": -1,
                "valuesMaxCount": -1,
                "isUnique": false,
                "isIndexable": false,
                "includeInNotification": false,
                "searchWeight": -1,
                "relationshipTypeName": "AtlasGlossarySemanticAssignment",
                "isLegacyAttribute": false
            }
        ]
    }
    

    Further, you need to make sure the typeName in end1 and end2 is as per the relationship def, which you can check in type definition:

    GET: /api/atlas/v2/types/typedef/name/hive_table_db
        {
            "category": "RELATIONSHIP",
            "guid": "79257a2c-407c-4c0b-b3ae-04b1b3a8d649",
            "createdBy": "root",
            "updatedBy": "root",
            "createTime": 1548175553894,
            "updateTime": 1548175553894,
            "version": 1,
            "name": "hive_table_db",
            "description": "hive_table_db",
            "typeVersion": "1.0",
            "serviceType": "hive",
            "attributeDefs": [],
            "relationshipCategory": "COMPOSITION",
            "propagateTags": "NONE",
            "endDef1": {
                "type": "hive_table",
                "name": "db",
                "isContainer": false,
                "cardinality": "SINGLE",
                "isLegacyAttribute": true
            },
            "endDef2": {
                "type": "hive_db",
                "name": "tables",
                "isContainer": true,
                "cardinality": "SET",
                "isLegacyAttribute": false
     }
    }