Search code examples
graph-databasesgremlintinkerpoptinkerpop3

Gremlin: Determine vertices that are in all connections of another vertex


I am using TinkerPop3 Gremlin Console 3.3.1 to analyze a graph database. I want to determine which vertices have connections that overlap all similar connections for other vertices of the same label. For example, using the TinkerFactory Modern graph with an additional “software” vertex and two “created” edges for clarity:

graph = TinkerFactory.createModern()
==>tinkergraph[vertices:6 edges:6]
g = graph.traversal() 
==>graphtraversalsource[tinkergraph[vertices:6 edges:6], standard]
graph.addVertex(T.label, "software", T.id, 13, "name", “sw")
==>v[13]
g.V("4").addE("created").to(V("13"))
==>e[14][4-created->13]
g.V("6").addE("created").to(V("5"))
==>e[15][6-created->5]

See the following image of the Modern graph modified by me. I put my arrows of interest in orange.

Modified TinkerFactory Modern graph visual

With this example, I would want to determine which people have created software that encompasses all the software that another person has created. It’s not necessary to know which software. So the results of this example would be:

  • Josh (V(4)) co-created all software created by Marko (V(1))
  • Josh (V(4)) co-created all software created by Peter (V(6))
  • Peter (V(6)) co-created all software created by Marko (V(1))

Another way to word it would be “all software created by Marko also had Josh as a creator,” etc.

The code below is as far as I could get. It’s meant to find the overlapping connections by checking if the amount of software shared between each person and “a” is equal to the total amount of software created by “a”. Unfortunately it doesn’t give a result.

gremlin> 
g.V().has(label,"person").as("a").
    both().has(label,"software").aggregate("swA").
    both().has(label,"person").where(neq("a")).dedup().
    where(both().has(label,"software").
       where(within("swA")).count().
           where(is(eq(select("swA").unfold().count())
           )
        )
    ).as("b").select("a","b").by(“name”)

Any help is greatly appreciated!


Solution

  • First find all pairs of persons who co-created at least one product.

    g.V().hasLabel('person').as('p1').
      out('created').in('created').
      where(neq('p1')).as('p2').
      dedup('p1','p2').
      select('p1','p2').
        by('name')
    

    From there you can add a bit of pattern matching to verify that the number of created products of person p1 matches the number of connections from those products to person p2.

    g.V().hasLabel('person').as('p1').
      out('created').in('created').
      where(neq('p1')).as('p2').
      dedup('p1','p2').
      match(__.as('p1').out('created').fold().as('x'),
            __.as('x').count(local).as('c'),
            __.as('x').unfold().in('created').where(eq('p2')).count().as('c')).
      select('p1','p2').
        by('name')
    

    The result:

    gremlin> g.V().hasLabel('person').as('p1').
    ......1>   out('created').in('created').
    ......2>   where(neq('p1')).as('p2').
    ......3>   dedup('p1','p2').
    ......4>   match(__.as('p1').out('created').fold().as('x'),
    ......5>         __.as('x').count(local).as('c'),
    ......6>         __.as('x').unfold().in('created').where(eq('p2')).count().as('c')).
    ......7>   select('p1','p2').
    ......8>     by('name')
    ==>[p1:marko,p2:josh]
    ==>[p1:marko,p2:peter]
    ==>[p1:peter,p2:josh]