Search code examples
arangodbaql

Efficient ArangoDB AQL query that can find all Edges in Collection that go To and From the same Documents


I have an ArangoDB with two collections, one for Documents, and one for Edges, I'd like an efficient query capable of only returning the Edges that share the same _to and _from values.

For example, assume my Documents Collection contains A, B, C, and D. Let's also assume my Edges Collection contains X, Y, and Z.

Edge X is _from A and _to B.

Edge Y is _from A and _to B as well.

Edge Z is _from C and _to D.

However, I do not know that X and Y are basically the same Edge, and I don't know that the Documents that share similar Edges are A & B. My query is to seek out these duplicate Edges.

What I have so far looks like this:

FOR ec1 IN edge_collection
    FOR ec2 IN edge_collection
        FILTER ec1._key != ec2._key AND ec1._to == ec2._to AND ec1._from == ec2._from
        RETURN ec1

This seems to work, though it also feels terribly inefficient. Is there a better way to go about doing this, or is this the best possible solution with AQL?


Solution

  • You can group by _from and _to, count how many edges there are per group, and filter out the unique combinations:

    FOR ec IN edge_collection
      COLLECT from = ec._from, to = ec._to WITH COUNT INTO count
      FILTER count > 1
      RETURN { from, to, count }
    

    Or if you want to return the edge keys as well:

    FOR ec IN edge_collection
      COLLECT from = ec._from, to = ec._to INTO edges = ec._key
      LET count = LENGTH(edges)
      FILTER count > 1
      RETURN { from, to, count, edges }
    

    Alternatively using an aggregation instead of a post-calculation:

    FOR ec IN edge_collection
      COLLECT from = ec._from, to = ec._to AGGREGATE count = LENGTH(1) INTO edges = ec._key
      FILTER count > 1
      RETURN { from, to, count, edges }
    

    To return the full edges use INTO edges = ec instead. You could also use just INTO edges but then each edge will be nested in an object {"ec": … }.