Search code examples
pythonneo4jcypherpy2neo

neo4j, in a forum structure, how to find how many response each post got (including child of child...)


I'm writing a program that analyse posts in a forum.
After loading forum threads into neo4j DB,
I'm trying to "Rank" posts by the number of responses they got.

Responses include direct responses as well as the entire sub-tree for each direct response.
The idea is to count all children down the tree (the tree is a simple tree without any loops)

Every post is a neo4j node

# Create MSG nodes:
statement = "CREATE (c:MSG {id:{N}, title:{T}}) RETURN c"
for msg in msgs:
    graph.cypher.execute(statement, {"N": msg[0], "T": msg[1]})

Node that represent a post which is a response to another post has a relation r:CHILD_OF to his parent node.
root nodes will not have r:CHILD_OF relation, but will have a "0" as their parent ID

|parent id | msg id | Rank | List of all responses
+----------+--------+------+----------------------
|0         | 1051   | 3    | (1054, 1056, 1060)
|1051      | 1054   | 0    |
|1051      | 1056   | 1    | (1060)
|1056      | 1060   | 0    |
|0         | 1052   | 0    |

in this table,

  • msg 1051 is a first post in a thread
  • msg 1052 is a first post in another thread
  • msg 1051 got 2 direct responses (1054, 1056) and another in-direct response (1060)
  • msg 1056 got 1 direct response (1060)

I need to get the cypher that can create this ranking.
But not sure how to write it.
The project is in python and I'm using python 2.7, py2neo 2.0.3, neo4j 2.1.6


Solution

  • This query should return a result set similar to your table (but without the first column):

    MATCH (m:MSG)
    OPTIONAL MATCH (c:MSG)-[:CHILD_OF*1..]->(m)
    WITH m, COLLECT(DISTINCT c.id) AS childMsgIds
    RETURN m.id AS `msg id`, LENGTH(childMsgIds) AS Rank, childMsgIds AS `List of all responses`
    

    Does this suit your needs?