I am using TinkerPop3 Gremlin Console 3.3.1 to analyze a graph database. I want to determine which vertices have connections that overlap all similar connections for other vertices of the same label. For example, using the TinkerFactory Modern graph with an additional “software” vertex and two “created” edges for clarity:
graph = TinkerFactory.createModern()
==>tinkergraph[vertices:6 edges:6]
g = graph.traversal()
==>graphtraversalsource[tinkergraph[vertices:6 edges:6], standard]
graph.addVertex(T.label, "software", T.id, 13, "name", “sw")
==>v[13]
g.V("4").addE("created").to(V("13"))
==>e[14][4-created->13]
g.V("6").addE("created").to(V("5"))
==>e[15][6-created->5]
See the following image of the Modern graph modified by me. I put my arrows of interest in orange.
Modified TinkerFactory Modern graph visual
With this example, I would want to determine which people have created software that encompasses all the software that another person has created. It’s not necessary to know which software. So the results of this example would be:
Another way to word it would be “all software created by Marko also had Josh as a creator,” etc.
The code below is as far as I could get. It’s meant to find the overlapping connections by checking if the amount of software shared between each person and “a” is equal to the total amount of software created by “a”. Unfortunately it doesn’t give a result.
gremlin>
g.V().has(label,"person").as("a").
both().has(label,"software").aggregate("swA").
both().has(label,"person").where(neq("a")).dedup().
where(both().has(label,"software").
where(within("swA")).count().
where(is(eq(select("swA").unfold().count())
)
)
).as("b").select("a","b").by(“name”)
Any help is greatly appreciated!
First find all pairs of persons who co-created at least one product.
g.V().hasLabel('person').as('p1').
out('created').in('created').
where(neq('p1')).as('p2').
dedup('p1','p2').
select('p1','p2').
by('name')
From there you can add a bit of pattern matching to verify that the number of created products of person p1
matches the number of connections from those products to person p2
.
g.V().hasLabel('person').as('p1').
out('created').in('created').
where(neq('p1')).as('p2').
dedup('p1','p2').
match(__.as('p1').out('created').fold().as('x'),
__.as('x').count(local).as('c'),
__.as('x').unfold().in('created').where(eq('p2')).count().as('c')).
select('p1','p2').
by('name')
The result:
gremlin> g.V().hasLabel('person').as('p1').
......1> out('created').in('created').
......2> where(neq('p1')).as('p2').
......3> dedup('p1','p2').
......4> match(__.as('p1').out('created').fold().as('x'),
......5> __.as('x').count(local).as('c'),
......6> __.as('x').unfold().in('created').where(eq('p2')).count().as('c')).
......7> select('p1','p2').
......8> by('name')
==>[p1:marko,p2:josh]
==>[p1:marko,p2:peter]
==>[p1:peter,p2:josh]