Each node has unique nodeId
property which is guarantied by Neo4j constraints.
Each relationships has unique id
property.
Version of neo4j is 4.3.7.
Light green nodes are companies, pink nodes are people and other nodes are additional info which cannot be stored inside a node. Here, "Peter company" has 2 charges, primary address, secondary address, date of creation and date of termination.
I would like to remove old information from "Peter company" and its director. For this I am using such a query:
UNWIND $batch AS data
MATCH (n:Entity { nodeId: data.entityId })
OPTIONAL MATCH (n)-[rel]->(prop:Property)
WHERE NOT prop.nodeId IN data.propertyIds
DELETE rel
WITH prop
WHERE NOT exists(()-->(prop))
DELETE prop
where $batch
in this example is
[
{
'entityId': '0000',
'propertyIds': ['0002', '0003', '0004', '0005', '0006', '0009']
},
{
'entityId': '0001',
'propertyIds': ['0004', '0010']
},
]
entityId
is nodeId
of a node and propertyIds
are nodeId
of current additional information (properties).
If there is a connection from entity to a property which ID is not in propertyIds
, then this is an old information, and edge between them must be removed. In addition, if after that a property has no more incoming edges (it can have only incoming), it is deleted.
The list above contains IDs of company and its director and IDs of their current properties.
A date that has a connection to "Other company" is obsolete to "Peter Company" and its nodeId
is not present in the batch above. As a result of the query I expect that connection from the company to old property must be removed while the property is not deleted.
But I got an error:
Cannot delete node<18>, because it still has relationships. To delete this node, you must first delete its relationships.
Why I am getting an error? Node 18 has an incoming connection from "Other company", and thus exists(()-->(prop))
must return true
.
If I change that expression to exists(()--(prop))
, I get no error.
If I replace DELETE
with SET
in a query:
UNWIND $batch AS data
MATCH (n:Entity { nodeId: data.entityId })
OPTIONAL MATCH (n)-[rel]->(prop:Property)
WHERE NOT prop.nodeId IN data.propertyIds
SET rel.toPrune = true
WITH prop
WHERE NOT exists(()-->(prop))
SET prop.toPrune = true
Then relationship is marked while node is not, i. e., exists(()-->(prop))
returned true
.
I created an example in Python that reproduces the problem:
from neo4j import GraphDatabase
with GraphDatabase.driver("bolt://localhost:7687", auth=('neo4j', 'neo')) as driver, \
driver.session() as session:
create_graph = """
MERGE (n1:Test:Entity:Company {nodeId: "0000"}) SET n1.name = "Peter company"
MERGE (n2:Test:Entity:Person {nodeId: "0001"}) SET n2.name = "Peter"
MERGE (n3:Test:Property:Charge {nodeId: "0002"}) SET n3.status = "closed"
MERGE (n4:Test:Property:Charge {nodeId: "0003"}) SET n4.status = "opened"
MERGE (n5:Test:Property:Address {nodeId: "0004"}) SET n5.country = "France"
MERGE (n6:Test:Property:Address {nodeId: "0005"})
SET n6.country = "France"
SET n6.city = "Ham Les Varennes"
MERGE (n7:Test:Property:Date {nodeId: "0006"})
SET n7.date = datetime("2014-09-04T00:00:00")
SET n7.monthIsKnown = true
SET n7.dayIsKnown = true
MERGE (n8:Test:Property:Date {nodeId: "0007"})
SET n8.date = datetime("1962-01-01T00:00:00")
SET n8.monthIsKnown = false
SET n8.dayIsKnown = false
MERGE (n9:Test:Entity:Company {nodeId: "0008"}) SET n9.name = "Other company"
MERGE (n10:Test:Property:Date {nodeId: "0009"})
SET n10.date = datetime("1962-01-01T00:00:00")
SET n10.monthIsKnown = false
SET n10.dayIsKnown = false
MERGE (n11:Test:Property:Date {nodeId: "0010"})
SET n11.date = datetime("1976-01-01T00:00:00")
SET n11.monthIsKnown = false
SET n11.dayIsKnown = false
MERGE (n1)-[:HAS_CHARGE {id: 1}]->(n3)
MERGE (n1)-[:HAS_CHARGE {id: 2}]->(n4)
MERGE (n1)-[:HAS_PRIMARY_ADDRESS {id: 3}]->(n5)
MERGE (n1)-[:HAS_SECONDARY_ADDRESS {id: 4}]->(n6)
MERGE (n1)-[:HAS_TERMINATION_DATE {id: 5}]->(n7)
MERGE (n1)-[:HAS_CREATION_DATE {id: 6}]->(n8)
MERGE (n1)-[:HAS_CREATION_DATE {id: 7}]->(n10)
MERGE (n2)-[:FR_DIRECTOR {id: 8}]->(n1)
MERGE (n2)-[:HAS_COUNTRY_OF_RESIDENCE {id: 9}]->(n5)
MERGE (n2)-[:HAS_DATE_OF_BIRTH {id: 10}]->(n11)
MERGE (n9)-[:HAS_CREATION_DATE {id: 11}]->(n8)
"""
with session.begin_transaction() as tx:
tx.run(create_graph)
batch = [
{
'entityId': '0000',
'propertyIds': ['0002', '0003', '0004', '0005', '0006', '0009']
},
{
'entityId': '0001',
'propertyIds': ['0004', '0010']
},
]
clean_old_properties = """
UNWIND $batch AS data
MATCH (n:Entity { nodeId: data.entityId })
OPTIONAL MATCH (n)-[rel]->(prop:Property)
WHERE NOT prop.nodeId IN data.propertyIds
DELETE rel
WITH prop
WHERE NOT exists(()-->(prop))
DELETE prop
"""
with session.begin_transaction() as tx:
tx.run(clean_old_properties, dict(batch=batch))
An interesting note: if both queries in this example are executed in one transaction, then no error emitted.
The error was caused by a bug inside Neo4j kernel. It was fixed in version 4.4.12.
Kernel
- Fix node degree calculation when relationships are removed in the same transaction