Search code examples
neo4jcypherneo4j-apoc

How to merge nodes of the same community with Cypher in Neo4j?


There is a graph, each of its nodes contains property communityId to indicate which community the node belongs to. The nodes are connected with LINK relationship, which contains weight property.

What I want is to merge those nodes of the same community into a big node. The links between those big nodes (aka. communities) must be synthesized/combined reasonably: the weight property of the link must be added up, based on individual nodes in each community connected to the other. Direction of the link must be respected.

In the result graph, I will only see connected community nodes.

The closest function is Merge Nodes, function apoc.refactor.mergeNodes(). However, I'm dissatisfied with the result because:

  • Problem 1: The result community nodes have self-link.
  • Problem 2: Weights of the links are not combined although the documentation says so.

Problem 1 can be fixed by writing one more Cypher to remove self-links. But problem 2 can only be tackled with low-level access to the Graph (like mergeNodes() above).

Is there any elegant approach to have my desired graph (community nodes) in one go? Or at least, problem 2 must be fixed somehow.

Reproducibility

Graph:

CREATE (a:User {name: "A", communityId: 2}), (b:User {name: "B", communityId: 2}), (c:User {name: "C", communityId: 2}), (x:User {name: "X", communityId: 1}), (y:User {name: "Y", communityId: 1}), (z:User {name: "Z", communityId: 1}), (w:User {name: "W", communityId: 1}), (a)-[:LINK {weight: 1}]->(b), (b)-[:LINK {weight: 1}]->(c), (c)-[:LINK {weight: 1}]->(a), (b)-[:LINK {weight: 1}]->(z), (z)-[:LINK {weight: 1}]->(x), (z)-[:LINK {weight: 1}]->(w), (w)-[:LINK {weight: 1}]->(y), (y)-[:LINK {weight: 1}]->(x), (b)-[:LINK {weight: 1}]->(w)

enter image description here

Cypher:

MATCH (n:User)
WITH n.communityId AS communityId, COLLECT(n) AS nodes
CALL apoc.refactor.mergeNodes(nodes, {
    properties: {
        name: 'combine',
        communityId: 'discard',
        weight: 'combine'
    },
    mergeRels: true
})
YIELD node
RETURN node

enter image description here

System Requirement

  • Windows 8.1 x64
  • Neo4j Desktop v1.3.4 (Engine v4.1.1.)
  • APOC v4.1.0.2
  • Graph Data Science Library v1.3.2

Solution

  • I am not quite sure why APOC is not merging the relationships in your example. However, here is a Cypher query to get you started:

    MATCH (n:User)-[r]->(v:User)
    WHERE n.communityId <> v.communityId  // discard self loop
    WITH n.communityId as comId1, v.communityId as comId2, sum(r.weight) as w
    MERGE (su1:SuperUser {communityId: comId1})  // create or get merged node for n.communityId
    MERGE (su2:SuperUser {communityId: comId2})  // create or get node for v.communityId
    MERGE (su1)-[r:SUPER_LINK]->(su2)
    ON CREATE SET r.weight = w  // set relationship weight when it is created
    RETURN su1, su2, r
    

    which creates the following nodes and relationship:

    Super node and super link