Search code examples
vaticle-typedbvaticle-typeql

How to export edgelist from Grakn without using client APIs


I'm trying to export edges from grakn. I can do that with Python client like so:

edge_query = "match $c2c($c1, $c2) isa c2c; $c1 has id $id1; $c2 has id $id2;get $id1,$id2;"
with open(f"grakn.edgelist","w") as outfile:
    with GraknClient(uri="localhost:48555") as client:
        with client.session(keyspace=KEYSPACE) as session:
            with session.transaction().read() as read_transaction:
                answer_iterator = read_transaction.query(edge_query)
                for answer in tqdm(answer_iterator):
                    id1 = answer.get("id1")
                    id2 = answer.get("id2")
                    outfile.write(f"{id1.value()} {id2.value()} \n")

Edit: For each Relation, I want to export entities pairwise. The output can be a pair of Grakn IDs. I can ignore the attributes of relation or entities.

Exporting to edges seems like a common task. Is there a better way(more elegant, faster, more efficient) to do it in Grakn?


Solution

  • This works as long as the relation type c2c always has two roleplayers. However, this will produce two edges for every $c1, $c2, which is probably not what you want.

    Let's take a pair of Things, with ids V123 and V456. If they satisfy $c2c($c1, $c2) isa c2c; with $c1 = V123 and $c2 = V456 then they will also satisfy the same pattern as $c1 = V456 and $c2 = V123. Grakn will return all combinations of $c1, $c2 that satisfy your query, so you'll get two answers back for this one c2c relation.

    Assuming this isn't what you want, if $c1 and $c2 play different roles in the relation c2c (likely implying there is direction to the edge) then try changing the query, adding the roles, to:

    edge_query = "match $c2c(role1: $c1, role2: $c2) isa c2c; $c1 has id $id1; $c2 has id $id2; get $id1,$id2;"
    

    If they both play the same role (implying undirected edges), then we need to do something different in our logic. Either store edges as a set of sets of ids to remove duplicates without much effort, or perhaps consider using the Python ConceptAPI, something like this:

    relation_query = "match $rc2c isa c2c;get;"
    
    with open(f"grakn.edgelist","w") as outfile:
        with GraknClient(uri="localhost:48555") as client:
            with client.session(keyspace=KEYSPACE) as session:
                with session.transaction().read() as read_transaction:
                    answer_iterator = read_transaction.query(relation_query)
                    for answer in answer_iterator:
                        relation_concept = answer.get("rc2c")
                        role_players_map = relation_concept.role_players_map()
    
                        role_player_ids = set()
                        for role, thing in role_players_map.items():
                            # Here you can do any logic regarding what things play which roles
                            for t in thing:
                                role_player_ids.add(t.id) # Note that you can retrieve a concept id from the concept object, you don't need to ask for it in the query
                        outfile.write(", ".join(role_player_ids) + "\n")
    

    Of course, I have no idea what you're doing with the resulting edgelist, but for completeness, the more Grakn-esque way would be to treat the Relation as a first-class citizen since it represents a hyperedge in the Grakn knowledge model, in this case we would treat the Roles of the relation as edges. This means we aren't stuck when we have ternary or N-ary relations. We can do this by changing the query:

    match $c2c($c) isa c2c; get;
    

    Then in the result we get the id of the $c2c and of the $c.