Search code examples
vectorneo4jsumaggregation

Neo4j vector operations on arrays


I have a graph database where each Sentence node has an embeddings property which is an array of length 768. What I want to do is create a property in each of these node which is an aggregation of average of all the neighboring nodes' embeddings.

Basically,

for each node in the graph:
    sum = [0] * 768
    count = 0
    for neighbour in node.neighbours:
        sum = vector_sum(sum, neighbour.embeddings)
        count += 1
    avg = sum / count
    node.neighbours_average = avg

Currently, I'm using neomodel to read in the nodes, perform this in python and then ingest it into the graph. Understandably, it is pretty slow.

What would be the most efficient way to do this?

I've looked at Data Science Library, APOC, etc, but none of these have vector operations.


Solution

  • I was able to do this with the following query:

    Note: embedding is an array of floats

    match (s:Sentence)-[r:RELATED]-(t:Sentence)
    with s as sentence, collect(t.embedding) as neighbours_embeddings
    set sentence.neighbour_avg = [
      w in reduce(
        s=[], neighbour_embedding IN neighbours_embeddings | 
        case when size(s) = 0 then neighbour_embedding
        else [
          i in range(0, size(s)-1) |
          s[i] + neighbour_embedding[i]
        ] end) |
        w / tofloat(size(neighbours_embeddings))
    ]