I am new to Graph DBs in general and trying to learn Gremlin QL. I was wondering if there is a way to directly merge two immediate neighbors of two vertices who ids are known. For example, in the below graph
I don't want to traverse the entire graph, I just want the two subgraphs to merge based on their common neighbors and sort based on the sum of the weights of the two edges leading to the same Vertex.
In the above graph, I want to be able to display the vertices A, B, C, D
when I query with the vertices 1 and 2
. I want to be able to merge the outE of vertex 1 and vertex 2, aggregate the weights of (edge 1 -> A and Edge 2 -> A), (edge 1 -> B and Edge 2 -> B), (edge 1 -> C and Edge 2 -> C) and (edge 1 -> D and Edge 2 -> D) and sort the result based on this combined score.
The code for the graph creation is below
g.addV().property('id',1).property("type","A").as('1')
addV().property('id',2).property("type","B").as('2').
addV().property('id',A).property("type","X").as('A').
addV().property('id',B).property("type","X").as('B').
addV().property('id',C).property("type","X").as('C').
addV().property('id',D).property("type","X").as('D').
addE('connects').from('1').to('A').property("weight",0.1d)
addE('connects').from('1').to('B').property("weight",0.4d)
addE('connects').from('1').to('C').property("weight",0.2d)
addE('connects').from('1').to('D').property("weight",0.7d)
addE('connects').from('2').to('A').property("weight",0.5d)
addE('connects').from('2').to('B').property("weight",0.2d)
addE('connects').from('2').to('C').property("weight",0.7d)
addE('connects').from('2').to('D').property("weight",0.4d).iterate()
If I were to represent the above data in an SQL, a sample model is as below
create table items(id varchar(20), toId varchar(20), weight double(5,4), primary key (id, toId);
insert into items values("1","A",0.1);
insert into items values("1","B",0.4);
insert into items values("1","C",0.2);
insert into items values("1","D",0.7);
insert into items values("2","A",0.5);
insert into items values("2","B",0.2);
insert into items values("2","C",0.7);
insert into items values("2","D",0.4);
select toId, a.weight+b.weight as weight from items a, items b where a.id = "1" and b.id = "2" and a.toId = b.toId order by weight desc;
This typically returns
D, 0.11
C, 0.9
B, 0.6
A, 0.5
Any help in this is highly appreciated.
I made some adjustments to your data creation script as it had some syntax errors:
g.addV().property('id',1).property("type","A").as('1').
addV().property('id',2).property("type","B").as('2').
addV().property('id','A').property("type","X").as('A').
addV().property('id','B').property("type","X").as('B').
addV().property('id','C').property("type","X").as('C').
addV().property('id','D').property("type","X").as('D').
addE('connects').from('1').to('A').property("weight",0.1d).
addE('connects').from('1').to('B').property("weight",0.4d).
addE('connects').from('1').to('C').property("weight",0.2d).
addE('connects').from('1').to('D').property("weight",0.7d).
addE('connects').from('2').to('A').property("weight",0.1d).
addE('connects').from('2').to('B').property("weight",0.4d).
addE('connects').from('2').to('C').property("weight",0.2d).
addE('connects').from('2').to('D').property("weight",0.7d).iterate()
There might be some other ways to do this, but in my approach to writing the query I decided to start by collecting the edges that we needed between the two start vertices of "1" and "2":
gremlin> g.V().has('id',1).
......1> outE('connects').as('1e').
......2> inV().as('v').
......3> inE('connects').as('2e').
......4> where(outV().has('id',2)).
......5> select('1e','v','2e')
==>[1e:e[18][0-connects->6],v:v[6],2e:e[22][3-connects->6]]
==>[1e:e[19][0-connects->9],v:v[9],2e:e[23][3-connects->9]]
==>[1e:e[20][0-connects->12],v:v[12],2e:e[24][3-connects->12]]
==>[1e:e[21][0-connects->15],v:v[15],2e:e[25][3-connects->15]]
So the above gives us all the shared edges and plus the vertices we want to sum the weights for. You can do the summation with group()
step:
gremlin> g.V().has('id',1).
......1> outE('connects').as('1e').
......2> inV().as('v').
......3> inE('connects').as('2e').
......4> where(outV().has('id',2)).
......5> select('1e','v','2e').
......6> group().
......7> by(select('v').by('id')).
......8> by(select(values).
......9> unfold().has('weight').
.....10> values('weight').sum())
==>[A:0.2,B:0.8,C:0.4,D:1.4]
The first by()
modulator grabs the "v" key from the incoming Map
of results and extracts the "id" property from the vertex within it. The second by()
modulator produces the sum by doing something that might not immediately be clear. It gets the values from the incoming Map
(i.e. the vertex and related edges) then finds only those elements with the edge property of "weight" (the vertex will be filtered out). Finally, it sums those weight values.
This bit of Gremlin might be tuned a bit too much to the data in your example. In the event the graph structure creates more than these simple paths I imagine that you'd see some edge duplication in your output. If you don't care about the duplication and you want to sum it all then I suppose what I have will work as-is. If you only want to count unique edge weights in your sum then you probably need to add a dedup()
step after the has('weight')
.
If you need to order the result then you can add the order()
step and order by the values
in the Map
result:
gremlin> g.V().has('id',1).
......1> outE('connects').as('1e').
......2> inV().as('v').
......3> inE('connects').as('2e').
......4> where(outV().has('id',2)).
......5> select('1e','v','2e').
......6> group().
......7> by(select('v').by('id')).
......8> by(select(values).
......9> unfold().has('weight').
.....10> values('weight').sum()).
.....11> order(local).by(values, desc)
==>[D:1.4,B:0.8,C:0.4,A:0.2]
Note that you use local
because you are ordering within the current Map
of the traversal stream and not ordering the stream itself.