Search code examples
neo4jcypherneo4j-apocload-csv

How to update existing specific node in graphdb by loading updated CSV file in neo4J apoc


I am facing problem updating node by loading recently updated csv file in. neo4j. since it is a large file I think apoc procedure is need to be used. I have updated existing node by loading external updated file without apoc. but problem is I need to update it in parallel using apoc. here is my file element

original element in file

ID,SHOPNAME,DIVISION,DISTRICT,THANA
1795,ARAFAT DISTRIBUTION,RAJSHAHI,JOYPURHAT,Panchbibi
1796,CONNECT DISTRIBUTION,DHAKA,GAZIPUR,Gazipur Sadar
1797,HUMAYUN KABIR,DHAKA,DHAKA,Demra

I have created node from this CSV

then I have another updated file u.csv the updated elements are given bellow

ID,SHOPNAME,DIVISION,DISTRICT,THANA
1795,ABC,RAJSHAHI,JOYPURHAT,Panchbibi
1796,XYZ,DHAKA,GAZIPUR,Gazipur Sadar
1797,HUMAYUN KABIR,DHAKA,DHAKA,Demra

without apoc my query was

LOAD CSV FROM "file:///u.csv" AS line
MERGE (c:Agent {ID:line[0]})
ON MATCH SET c.SHOPNAME = line[1]
RETURN c

This code updated desired column except I have got a blank node

{"ID":"ID"}

my first question is why a new blank node is created and how could I solve this

Now I am wanting it for updating large file so I have used to apoc procedure for batch processing

with apoc my query was

CALL apoc.periodic.iterate('LOAD CSV WITH HEADERS FROM "file:///u.csv" AS line return line','MERGE (p:Agent{ID:TOINTEGER(line.ID)}) ON MATCH SET p.SHOPNAME=TOINTEGER(line.SHOPNAME) ' ,{batchSize:10000, iterateList:true, parallel:true});

but I could not updated the specific nodes rather it created two nodes with related id so I am getting 5 nodes here rather than 3 nodes

{"ID":1795} 
{"ID":1796}

I am very new to neo4j but trying to learn. kindly help me to solve the problem I am using neo4j 3.5.6 and apoc 3.5.0.4


Solution

  • I see 2-3 possible issues here:

    • Regarding Duplicate Nodes: You used TOINTEGER function in one and not in another data load query, so nodes are duplicated. One Agent node with id with the data type string and other Agent node with id with the data type integer.

    Suggestion: Use TOINTEGER function in both queries or none.

    • Regarding Blank Nodes: In your second query, you are setting node property only if node found(i.e. ON MATCH). But as per the first case, we have found it's creating a new node every time and not matching any of the previous node. Also not setting property when creating. So there will nodes with no SHOPNAME.

    Suggestion: Either Add ON CREATE to MERGE query or remove ON MATCH from MERGE query and update node every time. Adding ON CREATE is a recommended and efficient way.

    Please find below query with ON CREATE:

    MERGE (c:Agent {ID:line[0]})
    ON CREATE SET 
        c.SHOPNAME = line[1]
    
    • You are also converting SHOPNAME to integer in your query with APOC using TOINTEGER, this will not work.