Search code examples
arangodbnosql

Migrating a filterVertice UDF from ArangoDB 2.8 to ArangoDB 3


I am currently migrating a TRAVERSAL function from arangoDB 2 to arangoDB 3. The aql had a custom leaf visitor and a filterVertices option with a custom AQL function (for more specific filtering).

FOR result IN TRAVERSAL(
    page, 
    menu, 
    "page/99999999999999",
    "inbound",
    {filterVertices : "udf::customFilter", visitor : "udf::customVisitor", }
 ) RETURN result

The leaf visitor UDF was relatively easy to transfer since it just creates a custom object, but I am having trouble with the filterVertices UDF since in arango 3 the graph functions have been removed.

There are a few cases like the one below in the filterVertices UDF

    //check the page status
    if (mismatch == 1) {
        //stop traversal and not return mismatched
        return ['exclude', 'prune'];
    } else if (mismatch == 2) {
        //stop but return mismatched
        return 'prune';
    } else {
        //exclude mismatched but continue
        return 'exclude';
    }

My question is how should the prune and exclude be translated in FILTER cases in the aql below exactly ?

FOR v, d, p IN 1..10 INBOUND "page/99999999999999" menu 
    LET filtered = CALL('udf::customFilter',v,p) 
    LET result = CALL('udf::customVisitor',v,d,p) 
RETURN {filtered:filtered,result:result}

Will the performance be affected if I use the UDF as is and pass the result in a LET param and exclude (filter) them manually?


Solution

  • Update: JavaScript-based traversals are not available anymore. You can use the traversal language construct of AQL instead: https://docs.arangodb.com/3.11/aql/graphs/traversals/


    Generally speaking you can decide on "prune", "exclude" when you write the filter based on the path object (in your case p) Here the optimizer will recognize that any longer path cannot fulfill a certain condition. Examples here are:

    FILTER p.edges[1].type == 'FOO'
    FILTER p.edges[*].label ALL == 'BAR'
    FILTER p.vertices[*].age ALL >= 18
    

    First will prune whenever the second edge does not have type FOO. The second will prune whenever it finds a label != BAR etc. Only specific depth checks or global checks ALL, NONE, ANY can be recognized by the optimizer.

    You can decide on "exclude" if you define the filter on the vertex or edge output, in your case v and d:

    FILTER d.type != "BAR"
    FILTER v.name == "BAZ"
    

    The first will exclude all edges that have type "BAR", the second will only include vertices having name "BAZ". In both cases the traversal will go on.

    Right now there is no option to say PRUNE, INCLUDE.

    Using a UDF to implement the filtering only is extremely bad for performance. That is because the UDF is a "black box" for AQL and especially cannot be optimized into the Traversal for pruning. Still the performance of the AQL traversal is orders of magnitude better in our internal tests that's why we decided to go that way.

    Unfortunately UDF functions are slightly more flexible than AQL only, so there may be some functions that cannot be translated to FILTER only statements. However there still is an option to execute these Traversals in the same way as before 3.0 by just defining the entire Traversal as a user-defined function. This should have identical performance as before (the high-level algorithm is identical, but we changed many other internal parts in 3.0 which have performance-side-effects here).

    Your new UDF should roughly look like this and take the startVertex as input:

    var db = require("internal").db;
    var traversal = require("@arangodb/graph/traversal");
    var config = {
      datasource: traversal.collectionDatasource("menu"),
      filter: db._aqlfunctions.document("UDF::CUSTOMFILTER").code,
      visitor: db._aqlfunctions.document("UDF::CUSTOMVISITOR").code,
      maxDepth: 1 // has to be defined
    };
    var result = {
      visited: {
        vertices: [ ],
        paths: [ ]
      }
    };
    var traverser = new traversal.Traverser(config);
    traverser.traverse(result, startVertex);
    [...] // Do stuff with result here
    

    If you require some help with translation from UDF to FILTER or to get the full traversal UDF up and running please contact us directly via the https://groups.google.com/forum/#!forum/arangodb. We may require some mails to sort out all details you need.