Search code examples
neo4jcypherpy2neoneo4j-apoc

How should I convert this neo4j Cypher/Apoc load to neo4j-admin import?


I am working with email data and parsing it with python which produces a csv every hour. With that csv I have 5 separate load csv commands to create/update nodes and relationships. They are NO ATTACHMENT OR LINK, URL ONLY, ATTACHMENT ONLY, URL AND ATTACHMENT, and Attachment to Attachment Name, FileName Node.

I would like to automatically import these via batch job. Because of my familiarity I wanted to just do it in python, but I have been looking around stack and other places and people are recommending neo4j-admin import. From the documentation it looks very different than what I have been doing with --nodes and --relationships. Can anyone help with showing me how to convert a CYPHER/APOC LOAD CSV example that I have created below into a noe4j-admin import?

// URL AND ATTACHMENT
USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS FROM ("file:///sessions/4_hour_parsed_and_ready.csv") AS row
MERGE (a:Sender { name: row.From, domain: row.Sender_Sub_Fld})
MERGE (b:Link { name: row.Url_Sub_Fld, topLevelDomain: row.Url_Tld, htmlEncodedMessage: row.HTML_Encoded})
MERGE (c:Attachment { name: row.FileHash, fileExtension: row.FileName_Ext, containsMultipleExtensions: row.MultipleExtensions})
MERGE (d:Recipient { name: row.To})
WITH a,b,c,d,row
WHERE NOT row.Url_Tld = "false" AND NOT row.FileHash = "false"
CALL apoc.merge.relationship(a, row.Outcome, {}, {}, b) YIELD rel as rel1
CALL apoc.merge.relationship(b, row.Outcome2, {}, {}, d) YIELD rel as rel2
CALL apoc.merge.relationship(a, row.Outcome, {}, {}, c) YIELD rel as rel3
CALL apoc.merge.relationship(c, row.Outcome2, {}, {}, d) YIELD rel as rel4
RETURN a,b,c,d

Or how I can wrap this code in py2neo.


Solution

  • I just created a function that holds the server connection info and wrapped everything in a py2neo query then executed it.

    import py_2_neo_pass
    from py_2_neo_pass import db_server, db_user, db_password
    from py2neo import Graph, Node, Relationship
    
    graph = Graph(ip_addr = db_server, username = db_user, password = db_password)
    
    query='''
    USING PERIODIC COMMIT 1000
    LOAD CSV WITH HEADERS FROM ("file:///sessions/4_hour_parsed_and_ready.csv") AS row
    MERGE (a:Sender { name: row.From, domain: row.Sender_Sub_Fld})
    MERGE (b:Link { name: row.Url_Sub_Fld, topLevelDomain: row.Url_Tld, htmlEncodedMessage: row.HTML_Encoded})
    MERGE (c:Attachment { name: row.FileHash, fileExtension: row.FileName_Ext, containsMultipleExtensions: row.MultipleExtensions})
    MERGE (d:Recipient { name: row.To})
    WITH a,b,c,d,row
    WHERE NOT row.Url_Tld = "false" AND NOT row.FileHash = "false"
    CALL apoc.merge.relationship(a, row.Outcome, {}, {}, b) YIELD rel as rel1
    CALL apoc.merge.relationship(b, row.Outcome2, {}, {}, d) YIELD rel as rel2
    CALL apoc.merge.relationship(a, row.Outcome, {}, {}, c) YIELD rel as rel3
    CALL apoc.merge.relationship(c, row.Outcome2, {}, {}, d) YIELD rel as rel4
    RETURN a,b,c,d
    '''
    
    graph.run(query)