Search code examples
azureazure-cosmosdbgremlintinkerpopazure-cosmosdb-gremlinapi

Gremlin Query to check for pairs of edges on vertices


For some context: I am currently using azure cosmos db with gremlin api, because of the storage-scaling architecture, it's much less expensive to perform a '.out()' operation than a '.in()' operation, hence I always create double directed edges, so I choose which one to use with '.out()' operation depending on which direction I want to query.

We use the graph to associate events with users. Whenever a user 'U' raises an event 'E', we create two edges:

g.V('U').addE('raisedEvent').to(g.V('E'))
g.V('E').addE('raisedByUser').to(g.V('U'))

Very rarely, one of these queries fails for one reason or another and we end up with only a single edge between the two vertices. I've been trying to find a way to query for all vertices that have only a uni-directional relationship given a set of 'paired' edge-labels, in order to find these errors and re-create the missing edge.

Basically I need a query where...

  • given a pair of edge labels E1 (for outgoing, V1-->V2), E2 (for incoming V1<--V2)
  • finds finds all vertices V1 where for every outgoing edge E1 to another vertex V2, V2 doesn't have an edge E2 going back to V1; and vice-versa

Example:

// given a graph
g.addV('user').property('id','user_1')
g.addV('user').property('id','user_2')
g.addV('user').property('id','user_3')
g.addV('user').property('id','user_4')
g.addV('event').property('id','event_1')
g.addV('event').property('id','event_2')
g.addV('event').property('id','event_3')
g.addV('event').property('id','event_4')

g.V('user_1').addE('raisedEvent').to(g.V('event_1')).V('event_1').addE('raisedByUser').to(g.V('user1'))
g.V('user_2').addE('raisedEvent').to(g.V('event_2')).V('event_1').addE('raisedByUser').to(g.V('user1'))
g.V('user_2').addE('raisedEvent').to(g.V('event_3'))
g.V('event_4').addE('raisedByUser').to(g.V('user_3'))

// i.e.
//                (user_1) <--> (event_1)
// (event_2) <--> (user_2) ---> (event_3)
// (event_4) ---> (user_3)
//                (user_4)

// Then, the query should match with user_2 and user_3... 
// ...as they contain uni-directional links to events

Edit: Note - The cosmosdb implementation of the 'is()' operation doesn't support giving traversal results as an input I.e. queries such as


where(_.outE('raisedEvent').count().is(__.out('raisedEvent').outE('raisedByUser').count()))

Are currently unsupported in cosmosdb.

If possible, it would also be great to get a list of which pairs of vertices have a bad link (e.g. in this case [(user_2, event_3), (user_3, event_4)]), but just knowing which vertices have a bad link will be very useful already.


Solution

  • Thanks to Kelvin Lawrence, I ended up using this pattern to get a list of vertex id pairs that are only uni-directionally connected from a to b:

    g.V().haslabel("user").as('a').out('raisedEvent').where(__.not(out('raisedByUser').as('a'))).as('b').select('a','b').by('id')