Search code examples
gremlingraph-databasesjanusgraph

Gremlin: repeat until breakpoint, and batch the vertices together to produce a value


I'm learning graph databases by building a simple MLM network (basically a user can sponsor another user, and all users have at most one sponsor). I want to run a query that:

  • Go from a selected user to another user, until a certain predicate is satisfied - then sum the points of all users along the selected paths into a value (this value should be deduped to prevent double counting when a user branches out to multiple users).
  • Repeat this step 3 times, but each time start from the last user that was reached in the previous step.
  • Output the sums as a list.

I've been trying the following query:

    g.V(userID)
     .repeat(
       repeat(out('sponsors')
         .until(somePredicate)
         .out('hasPoints')
         .as('level') // How do I know the current loop iteration so I can store level1/level2/level3 in as step dynamically?
         // This is where I'm stuck, since I have no idea how to capture and sum all the points in this subtree.
         .in('hasPoints')
     )
     .times(3)
     // Also need to output the point sums as a list/map here, e.g. ["level1": 100, "level2": 100],
     // "level1" being the first iteration of repeat and so on.

Any pointer?

EDIT:

Here's a Gremlin script for sample data:

g.addV('user').property('id', 1).as('1')
  addV('user').property('id', 2).as('2').
  addV('user').property('id', 3).as('3').
  addV('user').property('id', 4).as('4').
  addV('user').property('id', 5).as('5').
  addV('user').property('id', 6).as('6').
  addV('user').property('id', 7).as('7').
  addV('point').property('value', 5).as('p1')
  addV('point').property('value', 5).as('p2').
  addV('point').property('value', 5).as('p3').
  addV('point').property('value', 5).as('p4').
  addV('point').property('value', 5).as('p5').
  addV('point').property('value', 5).as('p6').
  addV('point').property('value', 5).as('p7').
  addE('sponsors').from('1').to('2').
  addE('sponsors').from('1').to('3').
  addE('sponsors').from('1').to('4').
  addE('sponsors').from('2').to('5').
  addE('sponsors').from('3').to('6').
  addE('sponsors').from('4').to('7').
  addE('hasPoints').from('1').to('p1').
  addE('hasPoints').from('2').to('p2').
  addE('hasPoints').from('3').to('p3').
  addE('hasPoints').from('4').to('p4').
  addE('hasPoints').from('5').to('p5').
  addE('hasPoints').from('6').to('p6').
  addE('hasPoints').from('7').to('p7').
  iterate()

This is a query that I'm writing to group levels together based on some predicate:

g.V()
    .has('id', 1)
    .repeat('x',
        identity()
            .repeat(
                out('sponsors')
                    .choose(loops('x'))
                    .option(0, identity().as('a1'))
                    .option(1, identity().as('a2'))
                    .option(2, identity().as('a3'))
            )
            .until(or(out('hasPoints').has('value', gte(5))))
            .sideEffect(
                choose(loops('x'))
                    .option(0, select(all, 'a1'))
                    .option(1, select(all, 'a2'))
                    .option(2, select(all, 'a3'))
                    .unfold()
                    .choose(loops('x'))
                    .option(0, store('b1'))
                    .option(1, store('b2'))
                    .option(2, store('b3'))
            )
    )
    .times(3)
    .cap('b1', 'b2', 'b3')

Even though I can manually set the variables and choose the correct variables, I don't know how to do this dynamically yet - i.e. instead of times(3) there might be a situation where I need it to be until, so the iteration count is no longer known beforehand.


Solution

  • I've modified your data slightly to include a single "point" value less than 5 to prove that it was filtering properly and changed the "id" property to T.id so that results were easier to read while I was testing things:

    g.addV('user').property(id, 1).as('1').
      addV('user').property(id, 2).as('2').
      addV('user').property(id, 3).as('3').
      addV('user').property(id, 4).as('4').
      addV('user').property(id, 5).as('5').
      addV('user').property(id, 6).as('6').
      addV('user').property(id, 7).as('7').
      addV('point').property('value', 5).as('p1').
      addV('point').property('value', 5).as('p2').
      addV('point').property('value', 5).as('p3').
      addV('point').property('value', 5).as('p4').
      addV('point').property('value', 5).as('p5').
      addV('point').property('value', 4).as('p6').
      addV('point').property('value', 5).as('p7').
      addE('sponsors').from('1').to('2').
      addE('sponsors').from('1').to('3').
      addE('sponsors').from('1').to('4').
      addE('sponsors').from('2').to('5').
      addE('sponsors').from('3').to('6').
      addE('sponsors').from('4').to('7').
      addE('hasPoints').from('1').to('p1').
      addE('hasPoints').from('2').to('p2').
      addE('hasPoints').from('3').to('p3').
      addE('hasPoints').from('4').to('p4').
      addE('hasPoints').from('5').to('p5').
      addE('hasPoints').from('6').to('p6').
      addE('hasPoints').from('7').to('p7').
      iterate()
    

    If you just need to group dynamically based on the level iterated by repeat() then you can just group() on loops():

    gremlin> g.V(1).
    ......1>   repeat(out('sponsors').
    ......2>          group('m').
    ......3>            by(loops()).
    ......4>            by(out('hasPoints').has('value',gte(5)).
    ......5>               values('value').sum())).
    ......6>   cap('m')
    ==>[0:15,1:10]
    

    You mention that you'd like those values summed, which you can do, easily enough as:

    gremlin> g.V(1).
    ......1>   repeat(out('sponsors').
    ......2>          group('m').
    ......3>            by(loops()).
    ......4>            by(out('hasPoints').has('value',gte(5)).
    ......5>               values('value').sum())).
    ......6>   cap('m').
    ......7>   unfold().
    ......8>   select(values).
    ......9>   sum()
    ==>25
    

    Of course if you just need the total you can avoid group() completely:

    gremlin> g.V(1).
    ......1>   repeat(out('sponsors').
    ......2>          store('m').
    ......3>            by(coalesce(out('hasPoints').has('value',gte(5)).values('value'), 
    ......4>                        constant(0)))).
    ......5>   cap('m').
    ......6>   sum(local)
    ==>25
    

    Finally, if we no longer care about levels then we can probably go one better and get rid the side-effect of "m" completely and save that overhead:

    gremlin> g.V(1).
    ......1>   repeat(out('sponsors')).
    ......2>     emit().
    ......3>   out('hasPoints').has('value',gte(5)).
    ......4>   values('value'). 
    ......5>   sum()
    ==>25