I have a developer vertex and a project vertex and an edge with start date and end date properties. A developer can work only on one project at a given point of time. Sometimes they are assigned to multiple projects with the same start date by user by mistake.
I need to find out if a developer has more than one project assigned to them with same start date. If it is then I need to print
001 Akash 2021-06-01 2
If you look at 3rd developer, he was assigned to 2 projects starting 2021-07-01. But one of them is valid as it has an end date which has a corresponding record with start date of 2021-09-01. In my 2nd report I need to list
{"id":"003_P003","label":"works_in","start_date":"2021-07-01","end_date":"2021-07-05"}
which is duplicate and odd one.
I tried following query for #1,but it does not show me both id, start date and count. It only shows me developer's id and count
g.V().
hasLabel('developer').
outE('works_in').
groupCount().by(outV().id()).
groupCount().by('start_date').
limit(2).
unfold().
toList()
Then I tried next two which did not work and shows error message "‘Column’ object is not callable" (I am using Python Germlin)
g.V().
hasLabel('developer').
outE('works_in').by(values(outV().id(), 'start_date')).
groupCount().
unfold().
toList()
The next one below also throws error.
g.V().
hasLabel('developer').
outE('works_in').
groupCount().by(outV().id(), 'start_date').
unfold().
toList()
#2 I did not start yet. I am not sure if it can be done with Gremlin though.
developer (vertex)
[{"id":"001","label":"developer","name":"Akash","skill":"c#"},
{"id":"002","label":"developer","name":"John","skill":"react"},
{"id":"003","label":"developer","name":"Bruno","skill":"python"}]
project (vertex)
[{"id":"P001","label":"project","name":"Web App"},{"id":"P002","label":"project","name":"Smart Contract"}, {"id":"P003","label":"project","name":"Migrate to AWS"}]
works_in (edge)
[{"id":"001_P001","label":"works_in","start_date":"2021-06-01","end_date":"2021-12-31"},
{"id":"002_P002","label":"works_in","start_date":"2021-01-01","end_date":"2021-12-31"},
{"id":"001_P002","label":"works_in","start_date":"2021-06-01","end_date":"2021-06-30"},
{"id":"003_P003","label":"works_in","start_date":"2021-01-01","end_date":"2021-06-30"},
{"id":"003_P003","label":"works_in","start_date":"2021-07-01","end_date":"2021-07-05"},
{"id":"003_P002","label":"works_in","start_date":"2021-07-01","end_date":"2021-08-31"},
{"id":"003_P002","label":"works_in","start_date":"2021-09-01","end_date":"2021-12-31"}
]
Any help is appreciated. Also I am looking for tutorial/course to understand Gremlin query better other than Tinkerpop's official documentation.
gremlin> g.V().
......1> hasLabel('developer').
......2> local(
......3> __.as('b').
......4> outE().
......5> project('id', 'name', 'startDate').
......6> by(select('b').values('id')).
......7> by(select('b').values('name')).
......8> by(values('startDate')).
......9> groupCount().
.....10> unfold().as('a').
.....11> select(values).
.....12> is(gt(1)).
.....13> select('a')).
.....14> local(
.....15> union(select(keys).unfold().select(values), select(values)).fold())
==>[003,bruno,2021-07-01,2]
==>[001,akash,2021-06-01,2]
I couldn't write any straight forward query for the second usecase.