I'm currently importing some relationships in my graph using bolt driver in .net. I wanted to try the load csv
command for this case (source is in csv) and compare performence but the query is only applied to the first row. I tested with a skip n limit 1
and only managed to make it run row by row.
I'm thus wondering if there are any restriction on "complex" queries in a load csv loop?
Here is the query :
using periodic commit
LOAD CSV FROM "file:///path/to/my/file.csv" AS row fieldterminator ';'
with row
MATCH (n:Source {id:row[0]})
MATCH p=(o:Target {num:row[1]})-[:Version*..]->()
WHERE row[2] in labels(o)
WITH n, p ORDER BY LENGTH(p) DESC LIMIT 1
WITH n, last(nodes(p)) as m
MERGE (n)-[r:Rel]->(m);
Thanks!
My csv is just regular 3 columns CSV following this patern :
IDTEXT0000000001;V150;LabelOne
IDTEXT0000000002;M245;LabelOne
IDTEXT0000000003;D666;Labeltwo
etc.
By row by row I mean that I first tested with a limit 50
after with row
and as it did not work (nothing added) I then did limit 1
, skip 1 limit 1
, `skip 2 limit 2, etc. The "row by row" method works but you'll admit that it's not really what you wanna do.
using periodic commit
LOAD CSV FROM "file:///path/to/my/file.csv" AS row fieldterminator ';'
with row
MATCH (n:Source {id:row[0]})
MATCH p=(o:Target {num:row[1]})-[:Version*..]->()
WHERE row[2] in labels(o)
WITH n, p ORDER BY LENGTH(p) DESC
WITH n, last(nodes(collect(p)[0])) as m
MERGE (n)-[r:Rel]->(m);
And with apoc (slightly faster) :
using periodic commit
LOAD CSV FROM "file:///path/to/my/file.csv" AS row fieldterminator ';'
with row
MATCH (n:Source {id:row[0]})
call apoc.cypher.run('MATCH p=(o:Article {num:$num})-[:VersionChristopher*0..]->() WHERE $label in labels(o) WITH p ORDER BY LENGTH(p) DESC LIMIT 1 return last(nodes(p)) as m', {num:row[1], label:row[2]})
yield value
with n, value.m as m
MERGE (n)-[r:Rel]->(m);
But using bolt allows me to build a query without the label test and is still 3 to 4 times faster than with load csv. Thanks for helping :)
The problem is in your use of LIMIT within the query:
WITH n, p ORDER BY LENGTH(p) DESC LIMIT 1
This doesn't limit on a per-row basis, LIMIT applies to ALL rows. Where you had multiple rows of each n (from your CSV) and multiple p paths, after this limit is applied, you only have a single row, one n, one p, and subsequently, a single MERGE operation.
You should read up on how to limit results per row, once you fix that your query should be fine.