Search code examples
groovygraphgremlintraversal

Storing gremlin count values and comparing later in traversal


I am new to Gremlin and have just started with it.I have a graph where every vertex has a property Referenceable.qualifiedName and __typeName.

Edge label between vertices with '__typeName' as 'avro_schema' and 'avro_field' is '__avro_record.fields' and there is 1 to many relationship between 'avro_schema'(1) and 'avro_field'(many).

Edge label between vertices with '__typeName' as 'avro_field' and 'DataClassification' is 'classifiedAs' and there is 1 to many relationship between 'avro_field'(1) and 'DataClassification'(many).

I want to find out all 'avro_schema' vertices property 'Referenceable.qualifiedName' in the graph where each of the 'avro_field' has a 'classifiedAs' relationship to 'DataClassification'

I have tried the gremlin query to find out given a particular avro_schema find out the number of avro_field which has a relationship to DataClassification via classifiedAs. That works.

But i am not able to keep a count of the edges between avro_schema to avro_field and then compare with number of avro_field which has relationship to DataClassification type.

This gives number of classified avro_field for a give avro_schema

g.V().has('__typeName','avro_schema').has('Referenceable.qualifiedName' , "com.example.avro.test2Complicated16").outE('__avro_record.fields').inV().out('classifiedAs').has('__typeName','tDataClassification').count()

Also I tried to this to aggregate across all avro_schema which satisfies the condition,but it doesn't work.

g.V().has('__typeName','avro_schema').where(identity().out('__avro_record.fields').store('sumi').out('classifiedAs').has('__typeName','DataClassification').count().is(eq('sumi'.size()))).values('Referenceable.qualifiedName')

I want to know all the avro_schema in which all the avro_field has any of the 'classifiedAs' edge relationship to DataClassification

On further trying I got down to the query but size of the collection 'xm' is always returned as 0.

g.V().has('__typeName','avro_schema').local(out('__avro_record.fields').store('xm').local(out('classifiedAs').has('__typeName', 'DataClassification').count().is(eq(1))).count().is(eq(select('xm').size())))


Solution

  • Not sure if I'm following the problem description correctly, but here's a wild guess for the traversal you might be looking for:

    g.V().has('__typeName','avro_schema').not(
        out('__avro_record.fields').
        out('classifiedAs').has('__typeName',neq('DataClassification'))).
      values('Referenceable.qualifiedName')
    

    UPDATE

    // at least one __avro_record.fields relation
    g.V().has('__typeName','avro_schema').filter(
        out('__avro_record.fields').
        groupCount().
          by(choose(out('classifiedAs').has('__typeName','DataClassification'),
                      constant('y'), constant('n'))).
        and(select('y'), __.not(select('n')))).
      values('Referenceable.qualifiedName')
    
    // include avro_schema w/o __avro_record.fields relations
    g.V().has('__typeName','avro_schema').not(
        out('__avro_record.fields').
        groupCount().
          by(choose(out('classifiedAs').has('__typeName','DataClassification'),
                      constant('y'), constant('n'))).
        select('n')).
      values('Referenceable.qualifiedName')