Search code examples
pythonneo4jpy2neo

How can I randomly walk between nodes by weights in Neo4j with Python?


I've created nodes in Neo4j with the following code,

from py2neo import Graph, Node, Relationship

g = Graph(password='neo4j')
tx = g.begin()

node1 = Node('Node', name='Node-1')
node2 = Node('Node', name='Node-2')
node3 = Node('Node', name='Node-3')
node4 = Node('Node', name='Node-4')
node5 = Node('Node', name='Node-5')
node6 = Node('Node', name='Node-6')
node7 = Node('Node', name='Node-7')

tx.create(node1)
tx.create(node2)
tx.create(node3)
tx.create(node4)
tx.create(node5)
tx.create(node6)
tx.create(node7)

rel12 = Relationship(node1, '0.2', node2, weight=0.2)
rel13 = Relationship(node1, '0.2', node3, weight=0.2)
rel14 = Relationship(node1, '0.6', node4, weight=0.6)
rel45 = Relationship(node4, '0.5', node5, weight=0.5)
rel46 = Relationship(node4, '0.3', node6, weight=0.3)
rel47 = Relationship(node4, '0.2', node7, weight=0.2)

tx.create(rel12)
tx.create(rel13)
tx.create(rel14)
tx.create(rel45)
tx.create(rel46)
tx.create(rel47)

tx.commit()

Here is the graph in Neo4j interface, neo4j-graph

I want to select a node by name and then, I want to walk to another node randomly. But random selection should be like that,

import random

random.choices(['Node-2', 'Node-3', 'Node-4'], weights=(0.2, 0.2, 0.6))

I can select the node with following code, but I don't know how to walk to the other node randomly.

from py2neo import Graph
from py2neo.matching import NodeMatcher

g = Graph(password='neo4j')
nodes = NodeMatcher(g)
node1 = nodes.match('Node', name='Node-1').first()

If node-1 is the starting point, ways that can be walked,

Node-1 -> Node-2
Node-1 -> Node-3
Node-1 -> Node-4 -> Node-5
Node-1 -> Node-4 -> Node-6
Node-1 -> Node-4 -> Node-7

Any idea? Thanks in advance.


Solution

  • Py2neo supports making Cypher queries, and here is a nice hello-world tutorial on how to do that.

    So, I will provide a commented Cypher query that will hopefully work for you.

    But first, a few notes:

    • You should not give relationships that serve the same purpose a virtually infinite number of types (like "0.2", "0.5", etc.), as that is VERY unhelpful when you want to search for specific relationships by type (which is one of the most common things you will want to do) and will cause an enormous number of relationship types. So I assume in my answer that the relationships of interest all actually have the TO type.
    • My query uses a temporary Temp node to store the temporary state of the query as it iterates through the relationships in a random path. The Temp node will be deleted at the end of the query.

    The query is as follows:

    // Get the (assumed-unique) starting Node `n` 
    MATCH (n:Node)
    WHERE n.name = 'Node-1'
    
    // Create (if necessary) the unique `Temp` node, and initialize
    // it with the native ID of the starting node and an empty `pathRels` list
    MERGE (temp:Temp)
    SET temp = {id: ID(n), pathRels: []}
    WITH temp
    
    // apoc.periodic.commit() repeatedly executes the query passed to it
    // until it returns 0 or NULL.
    // The query passed here iteratively extends the list of relationships
    // in `temp.pathRels`. In each iteration, if the current `temp.id`
    // node has any outgoing `TO` relationships, the query:
    // - appends to `temp.pathRels` a randomly-selected relationship, taking
    //   into account the relationship weights (which MUST sum to 1.0),
    // - sets `temp.id` to the ID of the end node of that selected relationship,
    // - and returns 1.
    // But if the current `temp.id` node has no outgoing `TO` relationships, then
    // the query returns 0.
    CALL apoc.periodic.commit(
      "
        MATCH (a:Node)
        WHERE ID(a) = $temp.id
        WITH a, [(a)-[rel:TO]->() | rel] AS rels
        LIMIT 1 // apoc.periodic.commit requires a LIMIT clause. `LIMIT 1` should be harmless here.
        CALL apoc.do.when(
          SIZE(rels) > 0,
          '
           WITH temp, a, REDUCE(s={x: rand()}, r IN rels | CASE
             WHEN s.x IS NULL THEN s
             WHEN s.x < r.weight THEN {x: NULL, pathRel: r}
             ELSE {x: s.x - r.weight} END
           ).pathRel AS pathRel
           SET temp.id = ID(ENDNODE(pathRel)), temp.pathRels = temp.pathRels + pathRel
           RETURN 1 AS result
          ',
          '
           RETURN 0 AS result
          ',
          {temp: $temp, a: a, rels: rels}
        ) YIELD value
        RETURN value.result
      ",
      {temp: temp}
    ) YIELD batchErrors
    
    // Use the `temp.pathRels` list to generate the `weightedRandomPath`
    // (or you could just return `pathRels` as-is).
    // Then delete the `Temp` node, since it is no longer needed.
    // Finally, return `weightedRandomPath`, and also the `batchErrors` returned by
    // apoc.periodic.commit() (in case it had any errors). 
    WITH temp, apoc.path.create(STARTNODE(temp.pathRels[0]), temp.pathRels) AS weightedRandomPath, batchErrors
    DELETE temp
    RETURN weightedRandomPath, batchErrors