I have this Database:
Clients => Incident => File => Filename
Clients have an ID Incidents have an ID and a reportedOn property Files have an ID and a fileSize, mimeType, malware property Filenames have an ID Client have a outgoing Edge to Incidents (reported), incident have a outgoing Edge to file (containsFile), file have a outgoing Edge to filename (hasName).
Here is some sample DATA:
g.addV('client').property('id','1').as('1').
addV('incident').property('id','11').property('reportedON', '2/15/2019 8:01:19 AM').as('11').
addV('file').property('id','100').property('fileSize', '432534').property('malwareSource', 'malware').as('100').
addV('fileName').property('id','file.pdf').as('file.pdf').
addE('reported').from('1').to('11').
addE('containsFile').from('11').to('100').
addE('hasName').from('100').to('file.pdf').iterate()
I am executing this query:
g.V().has('malwareSource', 'malware').as('FILE').out('hasName').as('FILENAME').select('FILE').in('containsFile').as('INCIDENT').select('FILE').valueMap().as('FILEVALUES').select('INCIDENT').valueMap().as('INCIDENTVALUES').select('FILE', 'FILEVALUES', 'FILENAME', 'INCIDENTVALUES')
How can I count how many incoming vertices each file with the property 'malware' has?
You really should use project()
- the code is so much more readable as shown in a separate question you had here:
gremlin> g.V().has('malwareSource', 'malware').
......1> project('FILE', 'FILENAME', 'FILEVALUES', 'INCIDENTVALUES').
......2> by().
......3> by(out('hasName')).
......4> by(valueMap()).
......5> by(__.in('containsFile').valueMap().fold())
==>[FILE:v[5],FILENAME:v[9],FILEVALUES:[fileSize:[432534],malwareSource:[malware],id:[100]],INCIDENTVALUES:[[reportedON:[2/15/2019 8:01:19 AM],id:[11]]]]
much easier to follow, though I still don't understand why you require this returned data structure as it repeats data in the result for "FILE" and "FILEVALUES". Well, that aside, you can see how easy it is to get the count of incoming edges...it's just adding an extra key to project()
and an extra by()
modulator to do the count()
:
gremlin> g.V().has('malwareSource', 'malware').
......1> project('FILE', 'FILENAME', 'FILEVALUES', 'INCIDENTVALUES', 'COUNT').
......2> by().
......3> by(out('hasName')).
......4> by(valueMap()).
......5> by(__.in('containsFile').valueMap().fold()).
......6> by(__.in().count())
==>[FILE:v[5],FILENAME:v[9],FILEVALUES:[fileSize:[432534],malwareSource:[malware],id:[100]],INCIDENTVALUES:[[reportedON:[2/15/2019 8:01:19 AM],id:[11]]],COUNT:1]
You could probably figure out how to do lines 5 and 6 one time to avoid dual iteration, but I would probably try to optimize that as a separate issue and consider adjusting your returned data structure to allow for it.