Search code examples
neo4jcyphergraph-databases

Is there a method to access specific node and relationship data within a graph catalog in neo4j GDS?


In Neo4j Graph Data Science (GDS), each graph can have a corresponding projection. How can I obtain detailed information about the actual nodes and relationships stored within a specific projection? Is there a method to access specific node and relationship data within a projection?

For a specific graph catalog projection, graph_0, I can use CALL gds.graph.list('graph_0') to retrieve basic information about the projection, such as the number of nodes and relationships. However, no detailed information about nodes and relations is listed.

------- add an example to explain more detail -----------

  1. create a simple graph with following cypher command:
CREATE
  (alice:Buyer {name: 'Alice'}),
  (instrumentSeller:Seller {name: 'Instrument Seller'}),
  (bob:Buyer {name: 'Bob'}),
  (carol:Buyer {name: 'Carol'}),
  (alice)-[:PAYS { amount: 1.0}]->(instrumentSeller),
  (alice)-[:PAYS { amount: 2.0}]->(instrumentSeller),
  (alice)-[:PAYS { amount: 3.0}]->(instrumentSeller),
  (alice)-[:PAYS { amount: 4.0}]->(instrumentSeller),
  (alice)-[:PAYS { amount: 5.0}]->(instrumentSeller),
  (alice)-[:PAYS { amount: 6.0}]->(instrumentSeller),

  (bob)-[:PAYS { amount: 3.0}]->(instrumentSeller),
  (bob)-[:PAYS { amount: 4.0}]->(instrumentSeller),
  (carol)-[:PAYS { amount: 5.0}]->(bob),
  (carol)-[:PAYS { amount: 6.0}]->(bob)

example graph

  1. project it.
MATCH (source)
OPTIONAL MATCH (source)-[r]->(target)
WITH gds.graph.project(
'graph_0',
source,
target,
{
sourceNodeLabels: labels(source),
targetNodeLabels: labels(target),
relationshipType: type(r)
}
) AS g
RETURN g.graphName AS graph, g.nodeCount AS nodes, g.relationshipCount AS rels

// return: 
//graph     nodes   rels
//"graph_0" 4       10
  1. run a random walk algorithm on the example graph
MATCH (start:Buyer {name: 'Alice'})
CALL gds.graph.sample.rwr('mySample', 'graph_0', { samplingRatio: 0.66, startNodes: [id(start)] })
YIELD nodeCount, relationshipCount
RETURN nodeCount, relationshipCount
// return:
// nodeCount    relationshipCount
// 3            8
  1. In this example, there are multiple relations with the same type between two nodes, e.g. PAYS between Alice and Instrument Seller. I want to know which relationships (identified with relationship ID) are sampled in this sampling process. However, the current provided methods can only figure out which type of relations are sampled, instead of exact relationship IDs.

Solution

  • Unfortunately, GDS does not provide a way currently (in GDS 2.4.4) to get relationship IDs out of a projection.

    But there is a somewhat ugly workaround. You can add to every relationship of interest a special property (say, _relId) containing that relationship's native ID. Then you can include that property when creating the projection, and get back that property after you generate your GDS results.

    For example, in your use case:

    • Add _relId to all relationships (since they are all of interest in your use case).

      MATCH ()-[r]->()
      SET r._relId = ID(r)
      
    • Project, and include _relId property.

      MATCH (source)
      OPTIONAL MATCH (source)-[r]->(target)
      WITH gds.graph.project(
        'graph_0',
        source,
        target,
        {
          sourceNodeLabels: labels(source),
          targetNodeLabels: labels(target),
          relationshipType: type(r),
          relationshipProperties: r { ._relId }
        }
      ) AS g
      RETURN g.graphName AS graph, g.nodeCount AS nodes, g.relationshipCount AS rels
      
    • Run random walk.

      MATCH (start:Buyer {name: 'Alice'})
      CALL gds.graph.sample.rwr('mySample', 'graph_0', { samplingRatio: 0.66, startNodes: [id(start)] })
      YIELD nodeCount, relationshipCount
      RETURN nodeCount, relationshipCount
      
    • Get relationships from output projection mySample, including the native relationship IDs (relNativeId in the result).

      CALL gds.graph.relationshipProperties.stream('mySample', ['_relId'])
      YIELD sourceNodeId, targetNodeId, relationshipType, propertyValue
      RETURN sourceNodeId, targetNodeId, relationshipType, TOINTEGER(propertyValue) AS relNativeId
      

    Here is a sample result:

    ╒════════════╤════════════╤════════════════╤═══════════╕
    │sourceNodeId│targetNodeId│relationshipType│relNativeId│
    ╞════════════╪════════════╪════════════════╪═══════════╡
    │171         │172         │"PAYS"          │256        │
    ├────────────┼────────────┼────────────────┼───────────┤
    │171         │172         │"PAYS"          │253        │
    ├────────────┼────────────┼────────────────┼───────────┤
    │171         │172         │"PAYS"          │257        │
    ├────────────┼────────────┼────────────────┼───────────┤
    │171         │172         │"PAYS"          │255        │
    ├────────────┼────────────┼────────────────┼───────────┤
    │171         │172         │"PAYS"          │258        │
    ├────────────┼────────────┼────────────────┼───────────┤
    │171         │172         │"PAYS"          │254        │
    └────────────┴────────────┴────────────────┴───────────┘
    

    Caveat: If you delete a projected relationship from the DB, then its projection (including its _relId value) will be invalid. In fact, the stale _relId value would either refer to nothing, or refer to some random new relationship.