I have a user search feature in my app where the searcher don't want to see some results, he does this by "blocking" a tag, when blocking a tag all users that are "subscribed" to that tag will be ignored in his search results.
I'm writing the query to filter the search results and I found 2 ways of getting the same:
First:
g.V(1991)
.out("blocked").fold().as("blockedTags")
.V().hasLabel("user")
.not(
where(
out("subscribed").where(
within("blockedTags")
)
)
)
Second:
g.V(1991).as("user")
.V().hasLabel("user")
.not(
where(
out("subscribed")
.in("blocked")
.as("user")
)
)
Gremlify: https://gremlify.com/xnqhvtzo6b
One uses within() and the other performs 2 steps out() and in(), I want to know which one is faster so I can decide which one to use, these 2 options are possible in many queries of my application.
EDIT:
I ran both queries in the gremlin console with profile() step at the end but the >TOTAL
field gives random time numbers from 0.300ms to 1.220ms for both queries, because of this I don't know how to compare the performance of 2 queries.
I will offer a general answer here that is largely derived from the comments on the question itself. It really isn't possible to profile()
one graph and then project those results on another. They will each have different capabilities and performance characteristics. If you need to know which of two approaches to a query is better, then you must test both traversals on the graph system you intend to target.
I'd also be wary of going too far in a particular development direction without doing ongoing testing on the target graph. Just as you wouldn't do all your development on MySQL only to switch to Oracle when it was time to go to production, you really shouldn't try to build your entire application against a graph you don't intend to use. There are subtle differences in these systems that could make a significant differences to you.
As to the differences in profile()
times on TinkerGraph, there is bound to be timing differences on the JVM for what I'm guessing is a test on a small dataset that resides in memory. Or perhaps for TinkerGraph there is no significant difference between the two approaches. Consider trying to execute the queries a few thousand times and average the time taken and compare that. Gremlin Console has a clock()
function that helps with that. Of course, as I alluded to earlier what you learn there is no guarantee that you have the right solution on Neptune.
If you'd like a bit of analysis about your queries I could offer a few words (though I don't base this thinking on Neptune specifically). How each performs depends a lot on your graph structure, but I think I'd be the first query to be faster because it captures "blocked" vertices with:
.out("blocked").fold()
and re-use it over and over for however many V().hasLabel('user')
there are. That's just a gut feeling though. I'm guessing the blocked list will be relatively small for a single user so traversing the opposing way with:
out("subscribed").in("blocked")
would just be more expensive as you would have to traverse a lot more "blocked" edges that don't terminate with the initial vertex.