In AWS Neptune I am trying to create a similarity edge between user vertices by calculating cosine similarity as described by Daniel Kuppitz here -- https://gist.github.com/dkuppitz/79e0b009f0c9ae87db5a#file-cosim-groovy-L368 -- gremlin's sideEffect provides a useful closure which allows for some math to be done to get the similarity scores and write this value to each 'similarity' edge. Alas, Neptune does not support sideEffect. I'm looking for a way to run the commented section in the example below in a single gremlin query without using sideEffect. Thanks for your help!
g.V().match(
__.as("u1").outE("rated").as("r1"),
__.as("r1").inV().as("m"),
__.as("m").inE("rated").as("r2"),
__.as("r2").outV().as("u2")
).where("u1", neq("u2")).
group().by(select("u1","u2")).
by(select("r1","r2").by("rating")).
unfold().
as("kv").
select(keys).
addE("similarity").from("u1").to("u2").as("e").
// sideEffect {
// def r = it.get("kv").getValue()
// def xyDotProduct = r.collect {it.r1*it.r2}.sum()
// def xLength = Math.sqrt(r.collect {it.r1*it.r1}.sum())
// def yLength = Math.sqrt(r.collect {it.r2*it.r2}.sum())
// def similarity = xyDotProduct / (xLength * yLength)
// it.get().property("similarity", similarity)
// }.iterate()
Amazon Neptune supports the sideEffect
step but does not support the use of Groovy closures with any step. I believe the same effect shown in the example could likely be achieved using a combination of the math
and project
steps. Here is a link to a somewhat similar (in terms of complexity) calculation for a Haversine Greate Circle distance. Perhaps you could use an approach similar to this one from Practical Gremlin.
start = 'SFO'
stop = 'NRT'
g.withSideEffect("rdeg", 0.017453293).
withSideEffect("gcmiles",3956).
V().has('code',start).as('src').
V().has('code',stop).as('dst').
select('src','dst').
by(project('lat','lon').
by('lat').
by('lon')).
as('grp').
project('ladiff','lgdiff','lat1','lon1','lat2','lon2').
by(project('la1','la2').
by(select('grp').select('src').select('lat')).
by(select('grp').select('dst').select('lat')).
math('(la2 - la1) * rdeg')).
by(project('lg1','lg2').
by(select('grp').select('src').select('lon')).
by(select('grp').select('dst').select('lon')).
math('(lg2 - lg1) * rdeg')).
by(select('grp').select('src').select('lat')).
by(select('grp').select('src').select('lon')).
by(select('grp').select('dst').select('lat')).
by(select('grp').select('dst').select('lon')).
math('(sin(ladiff/2))^2 + cos(lat1*rdeg) * cos(lat2*rdeg) * (sin(lgdiff/2))^2').
math('gcmiles * (2 * asin(sqrt(_)))')
Instead of using variables inside a closure, this type of approach uses project
to create effectively those same variables that can still be passed to math
steps.