Search code examples
gremlinamazon-neptune

Can Gremlin aggregate the values of edges connected to the same node?


Suppose you have one node with label, 'A'. This node is connected to many nodes with label, 'B', via edges with label 'e'. For a given B, there can be many edges between A and B with the same label, 'e'. On each edge, there is a property, 'p'.

We want to aggregate all the 'p' properties from edges connected from A, to the same B.

E.g. suppose we have a particular B. One edge between A and that B has a 'p' value of 'foo', and another edge connecting to the same B has a 'p' value of 'bar'. Their aggregation would be:

{'e': {'p': ['foo', 'bar']}

How can this be achieved?

At the moment, I have this query:

g.V()
    .hasLabel('A').as('A')
    .outE().hasLabel('e').as('e')
    .inV().hasLabel('B').as('B')
    .select('A', 'e', 'B')
    .by(valueMap())

It would produce an output like this:

[
    {{'A': {'name': ['john']}, {'e': {'p': ['foo']}, 'B': {'place': 'Qatar'}},
    {{'A': {'name': ['john']}, {'e': {'p': ['bar']}, 'B': {'place': 'Qatar'}},
    {{'A': {'name': ['john']}, {'e': {'p': ['hello']}, 'B': {'place': 'Argentina'}},
    {{'A': {'name': ['john']}, {'e': {'p': ['goodbye']}, 'B': {'place': 'Argentina'}}
]

Whereas, I would want this:

[
    {{'A': {'name': ['john']}, {'e': {'p': ['foo', 'bar']}, 'B': {'place': 'Qatar'}},
    {{'A': {'name': ['john']}, {'e': {'p': ['hello', 'goodbye']}, 'B': {'place': 'Argentina'}}
]

Solution

  • Using the data from the question, the following graph can be built:

    g.addV('A').property('name','john').property(id,'J1').as('j').
      addV('B').property('place','Qatar').property(id,'Q1').as('q').
      addV('B').property('place','Argentina').property(id,'A1').as('a').
      addE('e').from('j').to('q').property('p','foo').
      addE('e').from('j').to('q').property('p','bar').
      addE('e').from('j').to('a').property('p','hello').
      addE('e').from('j').to('a').property('p','goodbye')
    

    Using that graph, we can get close to what you are looking for using a nested group step. From these building blocks you should be able to construct other variations:

    g.V().hasLabel('A').as('a').outE('e').as('e').inV().hasLabel('B').as('b').
      group().
        by(select('a').values('name')).
        by(group().
          by(select('b').values('place')).
          by(select('e').values('p').fold()))
    

    Which yields

        {'john': {'Argentina': ['hello', 'goodbye'], 'Qatar': ['foo', 'bar']}}
    

    Using valueMap we can add the keys to the result:

    g.V().hasLabel('A').as('a').outE('e').as('e').inV().hasLabel('B').as('b').
      group().
        by(select('a').values('name')).
        by(group().
          by(select('b').valueMap('place')).
          by(select('e').valueMap('p').unfold().group().by(keys).by(values)))
    

    Which produces

    {'john': {{'place': ('Argentina',)}: {'p': ['hello', 'goodbye']}, {'place': ('Qatar',)}: {'p': ['foo', 'bar']}}}
    

    So, what we end up with, for each person (just "john" in this case), is a list containing each place they visited along with the "p" values for each edge that got them there). You can then select into that nested structure any way you need to to extract individual pieces. With these building blocks you should be able to tweak things to get any variations of this output that you prefer.