Search code examples
performanceneo4jload-csv

Neo4j load csv performance issues


I am trying to ingest data using load csv command in neo4j 4.0.4 enterprise edition CSV file has 15k rows and 9 columns below is my load csv command:

USING PERIODIC COMMIT LOAD CSV WITH HEADERS FROM "file:///data/csv_files/NA/admin_2020-05-18_10-19-23_481000/Relationships.csv" 
AS row WITH row WHERE row.`RELATIONSHIP_NAME` IS NOT NULL OR row.`ASSET_1_ID` IS NOT NULL OR row.`ASSET_2_ID` IS NOT NULL 
MERGE (relationship:Relationship{ name:row.`RELATIONSHIP_NAME`, asset2_id:row.`ASSET_2_ID`, asset1_id:row.`ASSET_1_ID` }) 
SET relationship += { name:row.`RELATIONSHIP_NAME`, asset2_id:row.`ASSET_2_ID`, asset1_id:row.`ASSET_1_ID` , relationship_status:row.`RELATIONSHIP_STATUS`, relationship_type_id:row.`RELATIONSHIP_TYPE_ID`, relationship_type:row.`RELATIONSHIP_TYPE`, asset1_type:row.`ASSET_1_TYPE`, asset1_id:row.`ASSET_1_ID`, asset2_type:row.`ASSET_2_TYPE`, asset2_id:row.`ASSET_2_ID`, relationship_comments:row.`RELATIONSHIP_COMMENTS`, modified_at: TIMESTAMP() }

and there are no relationships that are being created but it took around 7 mins to complete process and also I did explain query which has no eager operations.

dbms.memory.pagecache.size=12100m
dbms.memory.heap.max_size=11900m
dbms.memory.heap.initial_size=11900m

Am I missing something here or is there a way I can optimise the query.How would i get to improve the performance? Thanks in advance


Solution

  • Your problem is probably in merging by too? many properties at once. It is recommended to merge on a single property and then add additional properties. Another thing is to add indexes on the properties you will be using to MERGE on. Your query looks weird though. It looks like you are setting the same properties as you are MERGE-ing on. Without any additional information, I would change your query to:

    USING PERIODIC COMMIT LOAD CSV WITH HEADERS FROM "file:///data/csv_files/NA/admin_2020-05-18_10-19-23_481000/Relationships.csv" 
    AS row WITH row WHERE row.`RELATIONSHIP_NAME` IS NOT NULL OR row.`ASSET_1_ID` IS NOT NULL OR row.`ASSET_2_ID` IS NOT NULL 
    CREATE (relationship:Relationship) 
    SET relationship += { name:row.`RELATIONSHIP_NAME`, asset2_id:row.`ASSET_2_ID`, asset1_id:row.`ASSET_1_ID` , relationship_status:row.`RELATIONSHIP_STATUS`, relationship_type_id:row.`RELATIONSHIP_TYPE_ID`, relationship_type:row.`RELATIONSHIP_TYPE`, asset1_type:row.`ASSET_1_TYPE`, asset1_id:row.`ASSET_1_ID`, asset2_type:row.`ASSET_2_TYPE`, asset2_id:row.`ASSET_2_ID`, relationship_comments:row.`RELATIONSHIP_COMMENTS`, modified_at: TIMESTAMP() }