Search code examples
gremlintinkerpopamazon-neptune

efficient graph traversal to find connected components


I'm new to graphDB, I have a graph as shown in the attached image.

enter image description here

I want to find a connected path like "A1,E1,A2,D2,A3" for this I wrote the following query

g.V().hasLabel('A1').repeat(inE('edge').outV().outE().inV().cyclicPath()).times(5).path().map(unfold().not(hasLabel('edge')).fold()).count()

Where the label of all the edges is "edge". This query gives me output like below

A1,E1,A2,B2,A2

A1,E1,A1,D1,A1

A1,E1,A2,D2,A3

how can I modify my query to get "A1,E1,A2,D2,A3" as the answer and avoid other combinations as I'm interested only in the connections between two different A's like A1,A2, and A3 and what connects them. I'm not interested in (A1,B1,A1,C1,A1,D1,A1,E1,A1) as they are all the attributes belong to A1. I'm interested in finding the attributes that connect different A's like "A1,E1,A2,D2,A3".

Thanks,


Solution

  • I would try not to use labels as unique identifiers. Labels are meant to be lower cardinality or groupings. Instead, look to use the vertex ID or a property on each vertex to denote it's unique name.

    You could potentially use the first letter of your identifiers as the label, though. So you could have vertices with labels of A, B, C, D, E but with IDs of A1, A2... etc.

    Once you've done that, the query you're looking for should look something like:

    g.V('A1').
      repeat(both().simplePath()).
      until(hasId('A3')).
      path().
      by(id())
    

    Returns:

    A1, E1, A2, D2, A3