Search code examples
neo4jcypheropencypher

Cypher/Neo4j, What is the most efficient way to compose filter clauses programmatically?


I'm using the Java driver to fetch data from a Neo4j database. I have a function receiving a Set<ChrRegion> chrRegions and then I have to use the parameter to build a query like (in pseudo-code, I know chrRegions isn't an array):

MATCH ( g:Gene )
WHERE
  g.chromosomeID = ${chRegions [ 0 ].id} AND g.chromosomeBegin >= ${chRegions [ 0 ].begin} AND g.chromosomeEnd <= ${chRegions [ 0 ].end}
  OR g.chromosomeID = ${chRegions [ 1 ].id} AND g.chromosomeBegin >= ${chRegions [ 1 ].begin} AND g.chromosomeEnd <= ${chRegions [ 1 ].end}
  AND ... // for all the element in chrRegions

(meaning that a gene belongs to an indexed region in a named chromosome)

Building this query as a string is pretty straightforward, but I wonder if it's the most efficient way (I might have many users landing into this kind of query).

In other languages (Lucene, SQL, SPARQL), queries can be built programmatically, by assembling the tokens, is there something similar in the Neo4j driver? I suspect no, since the Query class contains the query string and its parameters, no structure, and probably the BOLT driver requires the string version anyway. But I'd like to check in there are further ideas about it.


Solution

  • There is no need to "programmatically compose a filter", since you can just pass your entire filter data array directly to a static Cypher query as a Cypher parameter. This is the most efficient (and recommended) way to do what you want. Not only would this approach avoid forcing the server to re-compile the query every time you have different input data, but it greatly simplifies your Java code, and also helps to avoid "Cypher injection" attacks.

    For example, if $chRegions is passed as a parameter:

    UNWIND $chRegions AS r
    MATCH (g:Gene)
    WHERE
      r.id = g.chromosomeID AND
      r.begin = g.chromosomeBegin AND
      r.end = g.chromosomeEnd
    RETURN g
    

    This should be especially quick if you have an index on the chromosomeID property of the Gene label.