Search code examples
neo4jgraph-algorithmtriangle-count

Triangle Counting/Clustering Neo4j


I would like to test Triangle Clustering in my Neo4j graph. Here is a sample:

CREATE(a:Person { name: "a" })-[:FRIENDS]->(b:Person {name : "b"}),
(a)-[:WORKS_AT]->(p:Business {name : "Mcdonalds"}),
(b)-[:WORKS_AT]->(p),
(c:Person { name: "c"})-[:FRIENDS]->(a),
(c:Person { name: "c"})-[:FRIENDS]->(b),
(d:Person { name: "d"})-[:FRIENDS]->(a)
return *

MATCH (c:Person {name: "c"}),(p:Business {name : "Mcdonalds"}), (d:Person { name: "d"}),(b:Person {name : "b"})
CREATE (c)-[:WORKS_AT]->(p),
(e:Person { name: "e"})-[:FRIENDS]->(c),
(d)-[:FRIENDS]->(c),
(d)-[:FRIENDS]->(e),
(f:Person { name: "f"})-[:FRIENDS]->(b),
(g:Person { name: "g"})-[:FRIENDS]->(b),
(i:Person { name: "i"})-[:FRIENDS]->(b),
(h:Person { name: "h"})-[:FRIENDS]->(b),
(j:Person { name: "j"})-[:FRIENDS]->(b),
(k:Person { name: "k"})-[:FRIENDS]->(b)
return *

MATCH (g:Person {name: "g"}),(f:Person {name: "f"}),(c:Person {name: "c"}), (e:Person {name: "e"})
CREATE (g)-[:FRIENDS]->(c),
(f)-[:FRIENDS]->(c),
(g)-[:FRIENDS]->(e)
return *

In my sample graph I would like select nodes a, b, c based on their :works_at relationship with McDonalds, then look at those nodes that have a :friends relationship and use those to conduct a Triangle Count. I've gotten a partial answer with:

CALL algo.triangleCount(
  'MATCH (p:Person)-[]-(:Person)-[:WORKS_AT]-(:Business {name : "Mcdonalds"}) RETURN id(p) as id',
  'MATCH (p1:Person)-[:FRIENDS]->(p2:Person) RETURN id(p1) as source, id(p2) as target',
  {concurrency:4, write:true, writeProperty:'triangle',graph:'cypher', clusteringCoefficientProperty:'coefficient'})
YIELD loadMillis, computeMillis, writeMillis, nodeCount, triangleCount, averageClusteringCoefficient  

But I'd like to have something closer to what is listed in the stream example in the documentation with a breakdown of nodeId (in this example node.name), triangles, and coefficient.

I have gotten closer with:

CALL algo.triangleCount.stream(
  'MATCH (p:Person)-[]-(:Person)-[:WORKS_AT]-(:Business {name : "Mcdonalds"}) RETURN id(p) as id',
  'MATCH (p1:Person)-[:FRIENDS]->(p2:Person) RETURN id(p1) as source, id(p2) as target',
  {concurrency:4, write:true, writeProperty:'triangle',graph:'cypher', clusteringCoefficientProperty:'coefficient'})
YIELD nodeId, triangles, coefficient 
MATCH (p:Person) WHERE id(p) = nodeId
RETURN p.id as name, triangles, coefficient  ORDER BY coefficient DESC

Solution

  • CALL algo.triangleCount.stream('match (p:Person)-[*1..2]-(b:Business) return p', '[]', {concurrency:4})
    YIELD nodeId, triangles, coefficient
    MATCH (p:Person) WHERE id(p) = nodeId
    RETURN p.name AS name, triangles, coefficient
    ORDER BY triangles
    

    Here's the answer I came up with. The key thing I was missing was understanding the difference between triangleCount and triangleCount.stream. Stream actually analyzes the data while the plain triangleCount merely provides stats on performance, counts, etc.