elasticsearch graph-databases gremlin janusgraph

Gremlin group by on multiple properties

I'm able to write aggregation/group by query on single property of an vertex. The below query includes ACL valuation while retrieving data which you can ignore while answering the question.

 g.V().has('user','userId',123).emit().until(__.not(outE('member_of'))).repeat(out('member_of')).outE('has_permission').has('permission','view').inV().as('f').select('f').group().by('folderType').by(count())

This gives me the following results

==>[PROJECT:2,RegularFolder:4,ORGANISATION:7,DIVISION:4]

just like folderType there are multiple properties for folder vertex.

the expectation is that like elasticsearch aggregation query results.

"folderType":[PROJECT:2,RegularFolder:4,ORGANISATION:7,DIVISION:4]
"CreatedBy":[user1:2,user2:4,user3:7,user4:4]

How to write the gremlin query which gives the above result and near to the expectation.

Solution

I know you said to ignore your initial query but I can't help but rewrite to:

g.V().has('user','userId',123).
  emit().
  until(__.not(outE('member_of'))).
  repeat(out('member_of')).
  outE('has_permission').has('permission','view').inV().
  groupCount().
    by('folderType')

as the step label of "f" isn't needed and it's more precise to use groupCount() in this case. If you need to groupCount() on multiple properties I suppose there might be several ways but in this case as you describe it, I think that the easiest thing is to calculate two groupCount() side-effects and then cap() out both of them together:

g.V().has('user','userId',123).
  emit().
  until(__.not(outE('member_of'))).
  repeat(out('member_of')).
  outE('has_permission').has('permission','view').inV().
  groupCount('folderType').
    by('folderType').
  groupCount('CreatedBy').
    by('CreatedBy').
  cap('folderType','CreatedBy')