Search code examples
arangodbarangojs

ArangoDB efficient traversal via node attributes


In OrientDB, each vertex has connected edges attached. This means that it possible to explicitly walk nodes from a collection by using nested "select" statements.

As an example: Given a path of node attributes, find the matching end node. The path consists of a list of node attributes (for example, the kind is unique in the nodes in a path).

Now, suppose I had a tree:

kind=site, name=site1
  -- kind=project, name=project1
  -- kind=library, name=library1
kind=site, name=site2
  -- kind=project, name=project2
  -- kind=library, name=library1

A user wants the information from a library called library1 with the path:

[{kind=site},{kind=project,name=project1},{kind=library,name=library1}]

Not having each node fully qualified for traversal is okay as long as the result is a single node.

In OrientDB, the process would be:

  • start with all nodes of kind=site
  • walk through the "child" edge and collect all objects that are of kind=project and name=project1
  • walk through the child edge and collect all objects that are of kind=library and name=library1

This can be done in a nested select statement. The kind field is indexed so the starting nodes are collected quickly from a large number of objects. To further improve performance, I know which kinds are in which tables (collections) so I could scope the number of objects to select from (select from <table> where kind=site).

In ArrangoDB, only the edges have the node binding information so having a node I can't directly walk through a connected edge. In the GRAPH_TRAVERSAL function, I can specify the starting collection by example. So the example would be {kind=site}. Doesn't that mean the starting list of nodes has to be gleaned by scanning all of the graph edges, basically looking at every node connected to the input of each edge in the entire graph?

How would such a query be formulated (in AQL and/or arangojs) so it wouldn't have to start with a full scan of the objects connected to the edges?

I also don't see how to send an example vertex into the arangojs traversal function. It seems to always want an explicit starting vertex.


Solution

  • With jobs like this in mind we designed the new AQL graph traversal which is going to be released with ArangoDB 2.8 (currently in late beta)

    Regarding your last point, we identify the concerned collections using named graphs which know the collections and their relation in the graph. I'll assume you will be using named graphs therefore.

    Lets use one of the ArangoDB 2.8 Example graphs with arangosh:

    var examples = require("org/arangodb/graph-examples/example-graph.js");
    var graph = examples.loadGraph("traversalGraph");
    

    It has its its ventices in the circles collection, and its edges connecting the vertices in the edges collection:

    db.circles.toArray();
    db.edges.toArray();
    

    We now can use FILTER statements in conjunction with the travesal depth to prune the traversal of a graph branch early in the execution.

    Here we filter for an attribute of a vertex to be found in the first layer of the traversal:

    db._query("FOR v, e, p IN 1..3 " + 
                  "OUTBOUND 'circles/A' " + 
                  "GRAPH 'traversalGraph' " + 
                  "FILTER p.vertices[1]._key != 'G' RETURN v._key");
    

    We also can filter for attributes of edges in a similar manner:

    db._query("FOR v, e, p IN 1..3 "
              "OUTBOUND 'circles/A' " +
              "GRAPH 'traversalGraph' " + 
              "FILTER p.edges[0].label != 'right_foo' RETURN v._key");
    

    The AQL graph traversal chapter also has an in depth explanation, in which way the traverser walks over the graph.