I have a GraphAware time tree and spatial r tree set up to reference a large set of nodes in my graph. I am trying to search these records by time and space.
Individually I can gather results from these queries in about 5 seconds:
WITH
({start:1300542000000,end:1350543000000}) as tr
CALL ga.timetree.events.range(tr) YIELD node as n
RETURN count(n);
> ~ 500000 results
WITH
({lon:120.0,lat:20.0}) as smin, ({lon:122.0,lat:21.0}) as smax
CALL spatial.bbox('spatial_records', smin, smax) YIELD node as n
RETURN count(n);
> ~ 30000 results
When I try to filter these results the performance drops drastically. Neo4j is already using up a large amount of memory in my system, so I am under the impression that the memory footprint of this command is too much on my system, and that the query will never finish. (I am using to the neo4j-shell to run these commands)
WITH
({start:1300542000000,end:1350543000000}) as tr,
({lon:120.0,lat:20.0}) as smin, ({lon:122.0,lat:21.0}) as smax
CALL ga.timetree.events.range(tr) YIELD node as n
CALL spatial.bbox('spatial_records', smin, smax) YIELD node as m
WITH COLLECT(n) as nn, COLLECT(m) as mm
RETURN FILTER(x in nn WHERE X in mm);
I am wondering what the best way to efficiently filter the results of these two statement calls is. I attempted to use the REDUCE clause, but couldn't quite figure out the syntax.
As a side question, given that this is the most common type of query that I will issue to my database, is this a good way to do things (as in using the time tree and r tree referencing the same set of nodes)? I haven't found any other tools in neo4j that support indexing both space and time in a single structure, so this is my current implementation.
The first procedure returns you 500k nodes, and collecting is a costly operation, so yeah this would be very memory heavy.
I would start from what returns you the less nodes, and then using cypher rather than a procedure, so here I would replace the call to the timetree procedure by a ranged query filter in Cypher.
Assuming you have an indexed timestamp
property on your nodes :
CALL spatial.bbox('spatial_records', smin, smax) YIELD node as m
WITH m
WHERE m.timestamp > 1300542000000 and m.timestamp < 1350543000000
RETURN m
I wouldn't recommend to remove the timetree (otherwise I would be fired <- joke) . In some time query cases the timetree would outperform the queries on ranged query, especially when the resolution is high (millisecond) and you have a lot of very consecutive timestamps.
Otherwise you seem to have a very good use case, this would be nice if you could send more details on the neo4j slack or privately (christophe at graphaware dot com), this could help Neo4j and GraphAware to maybe support more stuff via procedures (like passing a collection of nodes and filter out those not being in the range or a smooth combination with spatial) in a better way, as long as it is generic enough.
In the meantime, as you are using open source products, you could easily create a procedure that combine two procedures for your specific use case.