Search code examples
azure-cosmosdbgremlintinkerpoptinkerpop3gremlin-server

Union step does not work with multiple elements


The following query returns a user map with an "injected" property called "questions", it works as expected when g.V().has() returns a single user, but not when returns multiple users:

  return g.V().has("user", "userId", 1)
      .union(
         __.valueMap().by(__.unfold()),
         __.project('questions').by(
            __.outE('response').valueMap().by(__.unfold()).fold()
         )
      )
      .unfold()
      .group()
      .by(__.select(column.keys))
      .by(__.select(column.values));

It works, but when I change the first line to return multiple users:

g.V().hasLabel("user").union(....

I finish the query calling .toList() so I was expecting to get a list of all the users in the same way it works with a single user but instead I still get a single user. How can I get my query to work for both, multiple users or a single user?


Solution

  • When using Gremlin, you have to think in terms of a stream. The stream contains traversers which travel through the steps you've written. In your case, with your initial test of:

    g.V().has("user", "userId", 1)
          .union(
             __.valueMap().by(__.unfold()),
             __.project('questions').by(
                __.outE('response').valueMap().by(__.unfold()).fold()
             )
          )
          .unfold()
          .group()
          .by(__.select(column.keys))
          .by(__.select(column.values))
    

    you have one traverser (i.e. V().has("user", "userId", 1) produces one user) that flows to the union() and is split so that it goes to both valueMap() and project() both producing Map instances. You now have two traversers which are unfolded to a stream and grouped together to one final Map traverser.

    So with that in mind what changes when you do hasLabel("user")? Well, you now have more than one starting traverser which means you will produce two traversers for each of those users when you get to union(). They will each be flatted to stream by unfold() and then they will just overwrite one another (because they have the same keys) to produce one final Map.

    You really want to execute your union() and follow on operations once per initial "user" vertex traverser. You can tell Gremlin to do that with map():

    g.V().has("user", "userId", 1)
          .map(
            .union(
               __.valueMap().by(__.unfold()),
               __.project('questions').by(
                  __.outE('response').valueMap().by(__.unfold()).fold()
             )
            )
            .unfold()
            .group()
              .by(__.select(column.keys))
              .by(__.select(column.values))
           )
    

    Finally, you can simplify your final by() modulators as:

    g.V().has("user", "userId", 1)
          .map(
            .union(
               __.valueMap().by(__.unfold()),
               __.project('questions').by(
                  __.outE('response').valueMap().by(__.unfold()).fold()
             )
            )
            .unfold()
            .group()
              .by(keys)
              .by(values)
           )