Search code examples
graph-theorygraph-databasesarangodbaqldisjoint-sets

Disjoint subgraphs in ArangoDB AQL


Given a graph database I have an actors vetices collection that has edges connecting it to a scenes vertices collection.

I'm having trouble creating a query where I can select all the disjoint subgraphs, that is: given subgraphs A and B in the database, there's no edge (OUTBOUND or INBOUND) connecting sub-graph A to sub-graph B.

From this AQL:

FOR actor IN actors
FOR v, e, p IN 1..1 ANY actor._id acts_in_scenes
    RETURN e

I can get the following graph result as in the uploaded image, the result that I want is a list of lists containing all of the vertexes inside disjoint subgraphs (both actors and scenes). Examples were circled in red.

all-subgraphs

I've made several attempts using subqueries and collects, I think the best result up to this point is the query bellow, but still it's just returning me scenes:

FOR actor IN actors
    LET sub_result = (FOR v, e, p IN 1..1 ANY actor._id acts_in_scenes RETURN DISTINCT v._id) // this just returns me scenes
    FILTER LENGTH(sub_result) > 0
    RETURN DISTINCT sub_result

Does anyone know if this is possible to solve with an AQL query?

EDIT: So I've increased the depth to 5 (1..5) in the graph traversal subquery and now I can get actors vertices. Now the problem is when checking the result json, I can see repeated scene keys on the groups, this should not be possible if this result would represent the disjoint sub-graphs:

FOR actor IN actors
    LET sub_result = (
        FOR v, e, p IN 1..5 ANY actor._id acts_in_scenes 
        SORT v._id
        RETURN DISTINCT v._id
    )
    FILTER LENGTH(sub_result) > 0
    SORT COUNT(sub_result) DESC
    RETURN DISTINCT { 'count': COUNT(sub_result), result: sub_result }

EDIT 2:

I had to solve this by creating a graph on the application side using networkx library and using the nx.connected_components() function. But I really wish I could solve this by using just the graph features of the database, since it adds complexity to the application and mandates me to create a graph in memory on the app side.


Solution

  • In ArangoDB v3.7, a new Pregel algorithm wcc for finding Weakly Connected Components was added:

    https://www.arangodb.com/docs/stable/release-notes-new-features37.html#pregel

    It allows you to compute the subgraphs upfront. In arangosh:

    var pregel = require("@arangodb/pregel");
    
    var params = {
      "maxGSS": db.actors.count(), /* the number of vertices */
      "resultField": "subgraph"
    };
    
    var id = pregel.start("wcc", {
      vertexCollections:["actors"],
      edgeCollections:["acts_in_scenes"]
    }, params);
    

    When it's done (pregel.status(id).state equals "done"), each actor document will have a numeric attribute "subgraph" that is the same for all vertices of the same subgraph.