Sort the vertices of a given label
by the descending number of incoming edges of a given type
in the most efficient way.
conf/remote.yaml
file used)label
)type
)g.V().hasLabel(<label>).
order().
by(inE(<type>).count(),desc).
limit(10).
project("name","score").
by(<property_name>).
by(inE(<type>).count())
While this query gives the expected results, it is really slow (up to 7+ minutes) and this execution time is not affordable. Is there a way to improve it? Whether it is an improvement to the query itself, or adding an index somewhere that could help...
type
, but the total number of them. Would it still be a beneficial side-effect (assuming I understand them correctly) of the vertex-centric indexes?Thanks to everyone reading/replying!
vertex_label
property has been added and indexed on, and changing the hasLabel(<label>)
into has("vertex_label", <label>)
, the execution time went down again from 2 minutes to ~5 seconds. The vertex_label
property doesn't have any other use but to enable indexing on the type of vertex, JanusGraph not supporting indexing on labels, it provides a workaround.It seems that perhaps your query is doing quite a bit of repeated work there. What about something like this:
g.V().hasLabel('<label>').
group().
by('<key>').
by(inE('<type>').count()).
order(local).
by(values,desc).
unfold().
limit(10)
From memory, I don't think JanusGraph supports indexing on a label which may be part of the problem here. If so, storing the label also as a property, and creating an index on that property may help the initial finding of the 2K vertices.
UPDATED 2022-06-21 To show an actual example.
Using the air-routes data set, the query might look like this:
gremlin> g.V().hasLabel('airport').
......1> group().
......2> by('code').
......3> by(inE('route').count()).
......4> order(local).
......5> by(values,desc).
......6> unfold().
......7> limit(10)
==>FRA=307
==>IST=307
==>CDG=294
==>AMS=284
==>MUC=271
==>ORD=263
==>DFW=251
==>PEK=249
==>DXB=247
==>ATL=242