Search code examples
gremlinazure-cosmosdb-gremlinapi

Find the subtree difference in Gremlin


I described the graph here on gremlify. So I have four types of vertices: Content, User, Group, and Video. Content and Group serves as a container, User and Video are the leaves. Also, I have possible edges between a Group and a User, a Group and a Content, a Group, and a Video. User can also be assigned to a Video and Content directly. Video can be added to a Content vertex. I have to calculate the difference of the tree when Content drops from a Group. I created a query that traverses from Content, aggregates all directly assigned Users, and then subtracts those Users from Group members set:

g.V().has('ContentId', 1).in('Assigned').
  choose(label()).
  option('User', __.aggregate('DirectAssign')).
  option('Group', __.out('Added').where(without('DirectAssign')).
      as('ToDrop'))
      .select('ToDrop')

However, there are few drawbacks:

  • I have doubts that the query is optimal from a scale and performance standpoint, as on 100k users in a group - it consumes almost all my RUs
  • I need to calculate Video access for each user individually (not a big deal)
  • I can't write this query in added ORM framework, as aggregate creates a new scope there and it's impossible to refer to that aggregated collection in the 2nd option step.

So my question is: is it possible to rewrite this query (keep it single) without the choose step?


Solution

  • If I understood your use-case you can find those users by traversal back into the content. and therefore you could avoid aggregate and choose steps.

    g.V().hasLabel('Content').
        as('content').
      in('Assigned').hasLabel('Group').out('Added').
      not(where(out('Assigned').
            as('content')))
    

    example: https://gremlify.com/ozhf4t0xv4j