Search code examples
csvneo4jcypherload-csv

Cypher restrictions on queries chained after a load csv


I'm currently importing some relationships in my graph using bolt driver in .net. I wanted to try the load csv command for this case (source is in csv) and compare performence but the query is only applied to the first row. I tested with a skip n limit 1 and only managed to make it run row by row.

I'm thus wondering if there are any restriction on "complex" queries in a load csv loop?

Here is the query :

using periodic commit
LOAD CSV  FROM "file:///path/to/my/file.csv" AS row fieldterminator ';' 
with row
MATCH (n:Source {id:row[0]})
MATCH p=(o:Target {num:row[1]})-[:Version*..]->() 
WHERE row[2] in labels(o)
  WITH n, p ORDER BY LENGTH(p) DESC LIMIT 1    
  WITH n, last(nodes(p)) as m
MERGE (n)-[r:Rel]->(m);

Thanks!

Edit :

My csv is just regular 3 columns CSV following this patern :

IDTEXT0000000001;V150;LabelOne
IDTEXT0000000002;M245;LabelOne
IDTEXT0000000003;D666;Labeltwo
etc.

By row by row I mean that I first tested with a limit 50 after with row and as it did not work (nothing added) I then did limit 1, skip 1 limit 1, `skip 2 limit 2, etc. The "row by row" method works but you'll admit that it's not really what you wanna do.

Final code :

using periodic commit
LOAD CSV  FROM "file:///path/to/my/file.csv" AS row fieldterminator ';' 
with row
MATCH (n:Source {id:row[0]})
MATCH p=(o:Target {num:row[1]})-[:Version*..]->() 
WHERE row[2] in labels(o)
WITH n, p ORDER BY LENGTH(p) DESC    
WITH n, last(nodes(collect(p)[0])) as m
MERGE (n)-[r:Rel]->(m);

And with apoc (slightly faster) :

using periodic commit
LOAD CSV  FROM "file:///path/to/my/file.csv" AS row fieldterminator ';' 
with row
MATCH (n:Source {id:row[0]})
call apoc.cypher.run('MATCH p=(o:Article {num:$num})-[:VersionChristopher*0..]->() WHERE $label in labels(o) WITH p ORDER BY LENGTH(p) DESC LIMIT 1 return last(nodes(p)) as m', {num:row[1], label:row[2]})
yield value
with n, value.m as m
MERGE (n)-[r:Rel]->(m);

But using bolt allows me to build a query without the label test and is still 3 to 4 times faster than with load csv. Thanks for helping :)


Solution

  • The problem is in your use of LIMIT within the query:

    WITH n, p ORDER BY LENGTH(p) DESC LIMIT 1    
    

    This doesn't limit on a per-row basis, LIMIT applies to ALL rows. Where you had multiple rows of each n (from your CSV) and multiple p paths, after this limit is applied, you only have a single row, one n, one p, and subsequently, a single MERGE operation.

    You should read up on how to limit results per row, once you fix that your query should be fine.