Search code examples
neo4jcypher

neo4j Graph Data Science Library: unable to project node properties for embedd


Summary: Using Neo4j GDS library, I would like to do clustering on my graph data. Here I am sharing a test example of the data. The values in the data (I am hoping) are properties of the graph nodes. I am doing the analysis in the following steps:

  1. Load data from CSV and create nodes, relationships
  2. Project the graph, needed nodes and their proprties
  3. Node embeddings
  4. Using embeddings as for clustering algorithms such as kmeans

I am providing here data example and the Cypher queries, with out from the projection.

My data available in a csv:

scene_id,obj_type,lane
1,1,1
1,2,3
2,3,2
3,4,1
3,1,3
4,1,2
4,3,1
5,2,2
6,4,3
6,3,1
6,1,2

Query in Cypher for loading and creating nodes:

LOAD CSV WITH HEADERS FROM 'file:///modified_data.csv' AS row

MERGE (image:Image {scene_id: toFloat(row['scene_id'])})

MERGE (object:Object {obj_type: toFloat(row['obj_type'])})
MERGE (lane:Lane {direction: toFloat(row['lane'])})

MERGE (image)-[:CONTAINS]->(object)

MERGE (object)-[:IN_LANE]->(lane)

Query for graph projection which is causing the error:

CALL gds.graph.project(
  'imgraph1',
  {
    Image: {properties: 'scene_id},
    Object: {properties: 'obj_type'},     
    Lane: {properties: 'direction'}
  },
  ['CONTAINS', 'IN_LANE']
)

The output of the query given above seems fine:

Graph Name: imgraph1
Node Count: 13
Relationship Count: 20
nodeProjection: {'Lane': {'label': 'Lane', 'properties': {'direction': {'property': 'direction', 'defaultValue': None}}}, 'Image': {'label': 'Image', 'properties': {'scene_id': {'property': 'scene_id', 'defaultValue': None}}}, 'Object': {'label': 'Object', 'properties': {'obj_type': {'property': 'obj_type', 'defaultValue': None}}}}

Now, comes the embeddings query:

CALL gds.fastRP.mutate(
  'imgraph1', 
  {
    embeddingDimension: 64,
    featureProperties : ['scene_id', 'obj_type','direction'],
    iterationWeights: [0.1, 0.1, 0.1, 0.1, 0.1, 0.5],
    mutateProperty: 'embedding'
  }
)

And for this query, I am getting the following error:

Failed to invoke procedure gds.fastRP.mutate: Caused by: java.lang.IllegalArgumentException: The feature properties ['direction', 'obj_type', 'scene_id'] are not present for all requested labels. Requested labels: ['Image', 'Lane', 'Object']. Properties available on all requested labels: []


Solution

  • To incorporate all three properties scene_id, obj_type and direction in determining the embeddings, all of the nodes must have all three properties. In your projection, each node only has one of the three properties.

    One option would be to use the global configuration in a native projection to provide default values for those nodes that lack those properties (I've picked 0 just for illustration):

    CALL gds.graph.project(
      'imgraph1',
      ['Image', 'Object', 'Lane'],
      ['CONTAINS', 'IN_LANE'],
      {
        nodeProperties: {
        scene_id: {defaultValue: 0},
        obj_type: {defaultValue: 0},
        direction: {defaultValue: 0}
      }
    });
    

    Or you could merge the three properties into a new property in the projection. Here is a simple example where the first non-null value in the three properties populates a new property p:

    MATCH (s)-[r]->(t)
    WITH gds.graph.project(
        'imgraph1',
        s,
        t,
        {
            sourceNodeLabels: labels(s),
            sourceNodeProperties: s { p: coalesce(s.scene_id, s.obj_type, s.direction)},
            targetNodeLabels: labels(t),
            targetNodeProperties: t { p: coalesce(t.scene_id, t.obj_type, t.direction)},
            relationshipType: type(r)
        }
    ) AS g
    RETURN *