Search code examples
javaneo4jcypherbolt

Neo4j - Reading large amounts of data with Java


I'm currently trying to read large amounts of data into my Java application using the official Bolt driver. I'm having issues because the graph is fairly large (~17k nodes, ~500k relationships) and of course I'd like to read this in chunks for memory efficiency. What I'm trying to get is a mix of fields between the origin and destination nodes, as well as the relationship itself. I tried writing a pagination query:

MATCH (n:NodeLabel)-[r:RelationshipLabel]->(n:NodeLabel) 
WITH r.some_date AS some_date, r.arrival_times AS arrival_times,
     r.departure_times AS departure_times, r.path_ids AS path_ids,
     n.node_id AS origin_node_id, m.node_id AS dest_node_id
ORDER BY id(r)
RETURN some_date, arrival_times, departure_times, path_ids,
       origin_node_id, dest_node_id 
LIMIT 5000

(I changed some of the label and field naming so it's not obvious what the query is for)

The idea was I'd use SKIP on subsequent queries to read more data. However, at 5000 rows/read this is taking roughly 7 seconds per read, presumably because of the full scan ORDER BY forces, and if I SKIP it goes up in execution time and memory usage significantly. This is way too long to read the whole thing, is there any way I can speed up the query? Or stream the results in chunks into my app? In general, what is the best approach to reading large amounts of data?

Thanks in advance.


Solution

  • Instead of skip. From the second call you can do id(r) > "last received id(r)" it should actually reduce the process time as you go.