Search code examples
graph-databasesgremlintitan

How to perform cross join on different vertices in Gremlin/Tinkerpop


I am new in the graph database. It might be a very basic question but any help is really appreciated. I am using Gremlin/Tinkerpop 3 to query a graph stored in TitanDB.

Following is my data set.

User            |  associated_ids
---------------------------------
    A           | [ 1, 2 ]
    B           | [ 3, 4 ]
    c           | [ 5 ]
    D           | [ 4, 2 ]

Now want to get all the user's group who are linked with each other with any of the associated_id. Here [A] & [D] is directly linked as both have common entity [2] and [B] & [D] is directly linked as both have common entity [4]. So indirectly [A], [B] & [D] are linked.

Expected result something like.

    Users       |  associated_ids
    ----------------------------
    A,B,D       | [ 1, 2 ,3, 4]
    c           | [ 5 ]

Following are the steps I have performed to generate the graph.

    a = graph.addVertex(label, "person", "user", "A")
    b = graph.addVertex(label, "person", "user", "B")
    c = graph.addVertex(label, "person", "user", "C")
    d = graph.addVertex(label, "person", "user", "D")

    one = graph.addVertex('rec_id')
    one.property('ids', '1')

    two = graph.addVertex('rec_id')
    two.property('ids', '2')

    three = graph.addVertex('rec_id')
    three.property('ids', '3')

    four = graph.addVertex('rec_id')
    four.property('ids', '4')

    five = graph.addVertex('rec_id')
    five.property('ids', '5')

    a.addEdge('part_of',one)
    a.addEdge('part_of',two)
    b.addEdge('part_of', three)
    b.addEdge('part_of',four)
    c.addEdge('part_of',five)
    d.addEdge('part_of',four)
    d.addEdge('part_of',two)

Now wanted to know how can I get all the users who are associated with each other via any of the rec_id.

What would be the Gremlin query to get the desired output?

Also please let me know if I need to do any change in my graph structure to meet my requirement.


Solution

  • What you're basically looking for, are connected components.

    gremlin> g.V().
               emit(cyclicPath().or().not(both())).
                 repeat(both()).
                 until(cyclicPath()).
               aggregate("p").by(path()).cap("p").
               unfold().limit(local, 1).dedup().
               map(__.as("v").select("p").unfold().
                      filter(unfold().where(eq("v"))).
                      unfold().dedup().order().by(id).fold()).dedup().
               project("Users","associated_ids").
                 by(unfold().hasLabel("person").values("user").fold()).
                 by(unfold().hasLabel("rec_id").values("ids").fold())
    ==>[Users:[A,B,D],associated_ids:[1,2,3,4]]
    ==>[Users:[C],associated_ids:[5]]