Search code examples
neo4jcypher

Why neo4j cypher expression exists(() --> (n)) evaluates to false when actually it is true?


I have such a graph enter image description here

Each node has unique nodeId property which is guarantied by Neo4j constraints. Each relationships has unique id property. Version of neo4j is 4.3.7.

Light green nodes are companies, pink nodes are people and other nodes are additional info which cannot be stored inside a node. Here, "Peter company" has 2 charges, primary address, secondary address, date of creation and date of termination.

I would like to remove old information from "Peter company" and its director. For this I am using such a query:

UNWIND $batch AS data 
MATCH (n:Entity { nodeId: data.entityId }) 
OPTIONAL MATCH (n)-[rel]->(prop:Property) 
WHERE NOT prop.nodeId IN data.propertyIds
DELETE rel
WITH prop
WHERE NOT exists(()-->(prop))
DELETE prop

where $batch in this example is

[
    {
        'entityId':    '0000',
        'propertyIds': ['0002', '0003', '0004', '0005', '0006', '0009']
    },
    {
        'entityId':    '0001',
        'propertyIds': ['0004', '0010']
    },
]

entityId is nodeId of a node and propertyIds are nodeId of current additional information (properties). If there is a connection from entity to a property which ID is not in propertyIds, then this is an old information, and edge between them must be removed. In addition, if after that a property has no more incoming edges (it can have only incoming), it is deleted.

The list above contains IDs of company and its director and IDs of their current properties. A date that has a connection to "Other company" is obsolete to "Peter Company" and its nodeId is not present in the batch above. As a result of the query I expect that connection from the company to old property must be removed while the property is not deleted.

But I got an error:

Cannot delete node<18>, because it still has relationships. To delete this node, you must first delete its relationships.

Why I am getting an error? Node 18 has an incoming connection from "Other company", and thus exists(()-->(prop)) must return true.

If I change that expression to exists(()--(prop)), I get no error.

If I replace DELETE with SET in a query:

UNWIND $batch AS data 
MATCH (n:Entity { nodeId: data.entityId }) 
OPTIONAL MATCH (n)-[rel]->(prop:Property) 
WHERE NOT prop.nodeId IN data.propertyIds
SET rel.toPrune = true
WITH prop
WHERE NOT exists(()-->(prop))
SET prop.toPrune = true

Then relationship is marked while node is not, i. e., exists(()-->(prop)) returned true.

I created an example in Python that reproduces the problem:

from neo4j import GraphDatabase

with GraphDatabase.driver("bolt://localhost:7687", auth=('neo4j', 'neo')) as driver, \
        driver.session() as session:

    create_graph = """
    MERGE (n1:Test:Entity:Company {nodeId: "0000"}) SET n1.name = "Peter company"
    MERGE (n2:Test:Entity:Person {nodeId: "0001"}) SET n2.name = "Peter"
    MERGE (n3:Test:Property:Charge {nodeId: "0002"}) SET n3.status = "closed"
    MERGE (n4:Test:Property:Charge {nodeId: "0003"}) SET n4.status = "opened"
    MERGE (n5:Test:Property:Address {nodeId: "0004"}) SET n5.country = "France"
    MERGE (n6:Test:Property:Address {nodeId: "0005"})
        SET n6.country = "France"
        SET n6.city = "Ham Les Varennes"
    MERGE (n7:Test:Property:Date {nodeId: "0006"})
        SET n7.date = datetime("2014-09-04T00:00:00")
        SET n7.monthIsKnown = true
        SET n7.dayIsKnown = true
    MERGE (n8:Test:Property:Date {nodeId: "0007"})
        SET n8.date = datetime("1962-01-01T00:00:00")
        SET n8.monthIsKnown = false
        SET n8.dayIsKnown = false
    MERGE (n9:Test:Entity:Company {nodeId: "0008"}) SET n9.name = "Other company"
    MERGE (n10:Test:Property:Date {nodeId: "0009"})
        SET n10.date = datetime("1962-01-01T00:00:00")
        SET n10.monthIsKnown = false
        SET n10.dayIsKnown = false
    MERGE (n11:Test:Property:Date {nodeId: "0010"})
        SET n11.date = datetime("1976-01-01T00:00:00")
        SET n11.monthIsKnown = false
        SET n11.dayIsKnown = false
    
    MERGE (n1)-[:HAS_CHARGE {id: 1}]->(n3)
    MERGE (n1)-[:HAS_CHARGE {id: 2}]->(n4)
    MERGE (n1)-[:HAS_PRIMARY_ADDRESS {id: 3}]->(n5)
    MERGE (n1)-[:HAS_SECONDARY_ADDRESS {id: 4}]->(n6)
    MERGE (n1)-[:HAS_TERMINATION_DATE {id: 5}]->(n7)
    MERGE (n1)-[:HAS_CREATION_DATE {id: 6}]->(n8)
    MERGE (n1)-[:HAS_CREATION_DATE {id: 7}]->(n10)
    
    MERGE (n2)-[:FR_DIRECTOR {id: 8}]->(n1)
    MERGE (n2)-[:HAS_COUNTRY_OF_RESIDENCE {id: 9}]->(n5)
    MERGE (n2)-[:HAS_DATE_OF_BIRTH {id: 10}]->(n11)
    
    MERGE (n9)-[:HAS_CREATION_DATE {id: 11}]->(n8)
    """

    with session.begin_transaction() as tx:
        tx.run(create_graph)

    batch = [
        {
            'entityId':    '0000',
            'propertyIds': ['0002', '0003', '0004', '0005', '0006', '0009']
            },
        {
            'entityId':    '0001',
            'propertyIds': ['0004', '0010']
            },
        ]

    clean_old_properties = """
    UNWIND $batch AS data 
    MATCH (n:Entity { nodeId: data.entityId }) 
    OPTIONAL MATCH (n)-[rel]->(prop:Property) 
    WHERE NOT prop.nodeId IN data.propertyIds
    DELETE rel
    WITH prop
    WHERE NOT exists(()-->(prop))
    DELETE prop
    """

    with session.begin_transaction() as tx:
        tx.run(clean_old_properties, dict(batch=batch))

An interesting note: if both queries in this example are executed in one transaction, then no error emitted.


Solution

  • The error was caused by a bug inside Neo4j kernel. It was fixed in version 4.4.12.

    Kernel

    • Fix node degree calculation when relationships are removed in the same transaction

    Changelog

    Related issue