I have an ArangoDB with two collections, one for Documents, and one for Edges, I'd like an efficient query capable of only returning the Edges that share the same _to
and _from
values.
For example, assume my Documents Collection contains A, B, C, and D. Let's also assume my Edges Collection contains X, Y, and Z.
Edge X is _from
A and _to
B.
Edge Y is _from
A and _to
B as well.
Edge Z is _from
C and _to
D.
However, I do not know that X and Y are basically the same Edge, and I don't know that the Documents that share similar Edges are A & B. My query is to seek out these duplicate Edges.
What I have so far looks like this:
FOR ec1 IN edge_collection
FOR ec2 IN edge_collection
FILTER ec1._key != ec2._key AND ec1._to == ec2._to AND ec1._from == ec2._from
RETURN ec1
This seems to work, though it also feels terribly inefficient. Is there a better way to go about doing this, or is this the best possible solution with AQL?
You can group by _from
and _to
, count how many edges there are per group, and filter out the unique combinations:
FOR ec IN edge_collection
COLLECT from = ec._from, to = ec._to WITH COUNT INTO count
FILTER count > 1
RETURN { from, to, count }
Or if you want to return the edge keys as well:
FOR ec IN edge_collection
COLLECT from = ec._from, to = ec._to INTO edges = ec._key
LET count = LENGTH(edges)
FILTER count > 1
RETURN { from, to, count, edges }
Alternatively using an aggregation instead of a post-calculation:
FOR ec IN edge_collection
COLLECT from = ec._from, to = ec._to AGGREGATE count = LENGTH(1) INTO edges = ec._key
FILTER count > 1
RETURN { from, to, count, edges }
To return the full edges use INTO edges = ec
instead. You could also use just INTO edges
but then each edge will be nested in an object {"ec": … }
.