How to batch process millions of nodes and save the result into file

I have the following schema in Neo4j:

(:Foo)-[:HAS]->(:Bar)
(:Foo)<-[:IN]-(:Baz)

There are tens of millions of (:Foo), with potentially millions relationships between (:Foo) and (:Baz).

I want to get the biggest (:Foo) (in terms of a number of relationships between (:Foo) and (:Baz)) which do not have any relationships with (:Bar).

I was trying:

MATCH (f:Foo) WHERE NOT (f)-[:HAS]->(:Bar)
WITH f, count([(f)<-[:IN]-()]) as b_count WHERE b_count > 10 RETURN f, b_count

but that query never finishes.

I have also tried using apoc.periodic.iterate, but I don't know how to get the result.

CALL apoc.periodic.iterate(
"MATCH (f:Foo) WHERE NOT (f)-[:HAS]->(:Bar) RETURN f", 
"WITH f, count([(f)<-[:IN]-()]) as b_count WHERE b_count > 10 RETURN f, b_count",
 {parallel:true, batchSize:1000})

Ideally, I would like to get the results sorted by b_count and return only the N biggest.

Sorting all the results to only get the biggest N might be too memory-demanding. If the results could be saved to a file, I could use sort to order the results afterwards.

EDIT:

If possible, the query should be neo4j 3.5 compatible.

Solution

As mentioned in this answer, COUNT subqueries allow you to take advantage of the very efficient getDegree operation (by avoiding any DB hits).

If all HAS relationships from a Foo node end in a Bar node, then you can simplify your first pattern to (f)-[:HAS]->() to take advantage of the getDegree operation twice in the same query:

MATCH (f:Foo)
WHERE COUNT { (f)-[:HAS]->() } = 0
WITH f, COUNT { (f)<-[:IN]-() } AS b_count
WHERE b_count > 10
RETURN f, b_count

This query should be very fast.

Prior to neo4j 5.0

If you are using a version of neo4j older than 5.0, you should be able to replace COUNT { ... } with SIZE(...) to use the getDegree operation. Here is a knowledge base article about that.