Search code examples
elasticsearchgraph-databasesgremlinjanusgraph

Gremlin group by on multiple properties


I'm able to write aggregation/group by query on single property of an vertex. The below query includes ACL valuation while retrieving data which you can ignore while answering the question.

 g.V().has('user','userId',123).emit().until(__.not(outE('member_of'))).repeat(out('member_of')).outE('has_permission').has('permission','view').inV().as('f').select('f').group().by('folderType').by(count())

This gives me the following results

==>[PROJECT:2,RegularFolder:4,ORGANISATION:7,DIVISION:4]

just like folderType there are multiple properties for folder vertex.

the expectation is that like elasticsearch aggregation query results.

"folderType":[PROJECT:2,RegularFolder:4,ORGANISATION:7,DIVISION:4]
"CreatedBy":[user1:2,user2:4,user3:7,user4:4]

How to write the gremlin query which gives the above result and near to the expectation.


Solution

  • I know you said to ignore your initial query but I can't help but rewrite to:

    g.V().has('user','userId',123).
      emit().
      until(__.not(outE('member_of'))).
      repeat(out('member_of')).
      outE('has_permission').has('permission','view').inV().
      groupCount().
        by('folderType')
    

    as the step label of "f" isn't needed and it's more precise to use groupCount() in this case. If you need to groupCount() on multiple properties I suppose there might be several ways but in this case as you describe it, I think that the easiest thing is to calculate two groupCount() side-effects and then cap() out both of them together:

    g.V().has('user','userId',123).
      emit().
      until(__.not(outE('member_of'))).
      repeat(out('member_of')).
      outE('has_permission').has('permission','view').inV().
      groupCount('folderType').
        by('folderType').
      groupCount('CreatedBy').
        by('CreatedBy').
      cap('folderType','CreatedBy')