Search code examples
graphgremlinamazon-neptune

Gremlin, get two vertices that both have an edge to each other


So imagine you have 2000 people, they can choose to like someone which creates an edge between them, for example A likes B, now this doesn't necessarily mean that B likes A. How would I write a gremlin query to figure out everyone who likes each other? So where A likes B AND B likes A?

I've been looking around the internet and I've found .both('likes') however from what I understand is that this will get everyone who likes someone or who has someone who likes them, not both at the same time.

I've also found this

g.V().hasId('1234567').as('y').
  out('likes').
  where(__.in('likes').as('y'))

This works for 1 person, however I can't figure out how to get this to work for multiple people.

To me this seems like a simple enough problem for graph however I can't seem to find any solution online. From everything I've been reading it seems to infer that the data should be structured such that, if A likes B, that also means that B likes A. Which is achievable, when you create the edge that A likes B you can check if B already likes A, and if that's the case insert a special edge which is like... A inRelationshipWith B

The query for this would be g.V().both('inRelationshipWith') which would make things easier.

Is this an issue with how the data is structured and I am potentially using a graph database incorrectly, or is there actually a simple way to achieve what I want that I am missing?


Solution

  • You almost had it. Remember from the other vertex the relationship back to the starting vertex is also an out relationship from that vertex's point of view. The following query uses the air-routes data set to find all airports that have a route in both directions (analogous to your mutual friendship case)

    g.V().
       hasLabel('airport').as('a').
       out().as('b').
       where(out().as('a')).
       select('a','b').
         by('code')
    

    This will return pairs of relationships. It will include each airport (friend) twice for example:

    [a:DFW,b:AUS]
    [a:AUS,b:DFW]
    

    If you only want one of each pair adding a dedup step will reduce the result set to just one pair per relationships.

     g.V().
       hasLabel('airport').as('a').
       out().as('b').
       where(out().as('a')).
       select('a','b').
         by('code').
       order(local).
         by(values).
       dedup().
         by(values) 
    

    Finding the inverse case (where there is not a mutual relationship) is just a case of adding a not step to the query.

    g.V().
      hasLabel('airport').as('a').
       out().as('b').
       where(__.not(out().as('a'))).
       select('a','b').
         by('code')