I have a graph database where each Sentence
node has an embeddings
property which is an array of length 768. What I want to do is create a property in each of these node which is an aggregation of average of all the neighboring nodes' embeddings
.
Basically,
for each node in the graph:
sum = [0] * 768
count = 0
for neighbour in node.neighbours:
sum = vector_sum(sum, neighbour.embeddings)
count += 1
avg = sum / count
node.neighbours_average = avg
Currently, I'm using neomodel to read in the nodes, perform this in python and then ingest it into the graph. Understandably, it is pretty slow.
What would be the most efficient way to do this?
I've looked at Data Science Library, APOC, etc, but none of these have vector operations.
I was able to do this with the following query:
Note: embedding
is an array of floats
match (s:Sentence)-[r:RELATED]-(t:Sentence)
with s as sentence, collect(t.embedding) as neighbours_embeddings
set sentence.neighbour_avg = [
w in reduce(
s=[], neighbour_embedding IN neighbours_embeddings |
case when size(s) = 0 then neighbour_embedding
else [
i in range(0, size(s)-1) |
s[i] + neighbour_embedding[i]
] end) |
w / tofloat(size(neighbours_embeddings))
]