Search code examples
arrayscyphernested-listsneo4j-apocunwind

Some weird results of apoc.coll.zip()


I create 3 test nodes with name properties are "a", "b", "c", and use apoc.coll.zip() to combine two lists together:

MATCH (n:test) 
WITH collect(n.name) as nodes 
WITH apoc.coll.zip(nodes, range(0, size(nodes))) as pairs 
RETURN pairs;

+--------------------------------+
| pairs                          |
+--------------------------------+
| [["a", 0], ["b", 1], ["c", 2]] |
+--------------------------------+

The result is expected. What is interesting is when I modify the query either by adding another column in the RETURN clause or by UNWINDing the pair.

1. RETURN pairs,n.name;

MATCH (n:test)
WITH n, collect(n.name) as nodes
WITH n, apoc.coll.zip(nodes, range(0, size(nodes))) as pairs 
RETURN pairs,n.name;
+---------------------+
| pairs      | n.name |
+---------------------+
| [["a", 0]] | "a"    |
| [["b", 0]] | "b"    |
| [["c", 0]] | "c"    |
+---------------------+

I expect the result to be exactly the same with the query:

MATCH (n:test) 
WITH n, [["a", 0], ["b", 1], ["c", 2]] as nested 
RETURN nested, n.name;

+---------------------––––––––––––––––––--+
| pairs                          | n.name |
+---------------------––––––––––––––––––--+
| [["a", 0], ["b", 1], ["c", 2]] | "a"    |
| [["a", 0], ["b", 1], ["c", 2]] | "b"    |
| [["a", 0], ["b", 1], ["c", 2]] | "c"    |
+---------------------––––––––––––––––––--+

2. UNWIND pairs as pair RETURN pairs

MATCH (n:test)
WITH n, collect(n.name) as nodes
WITH n, apoc.coll.zip(nodes, range(0, size(nodes))) as pairs
UNWIND pairs as pair
RETURN pairs;

+------------+
| pairs      |
+------------+
| [["a", 0]] |
| [["b", 0]] |
| [["c", 0]] |
+------------+

I expect the result to be no different than having no UNWIND clause:

+--------------------------------+
| pairs                          |
+--------------------------------+
| [["a", 0], ["b", 1], ["c", 2]] |
+--------------------------------+

3. UNWIND pairs as pair RETURN pair

MATCH (n:test)
WITH n, collect(n.name) as nodes
WITH n, apoc.coll.zip(nodes, range(0, size(nodes))) as pairs
UNWIND pairs as pair
RETURN pair;
+----------+
| pair     |
+----------+
| ["a", 0] |
| ["b", 0] |
| ["c", 0] |
+----------+

I expect the result to be no different than simply UNWIND a nested list:

UNWIND [["a", 0], ["b", 1], ["c", 2]] as list 
RETURN list;

+----------+
| list     |
+----------+
| ["a", 0] |
| ["b", 1] |
| ["c", 2] |
+----------+

Do you know why these happen? They don't seem to be explained in RETURN and UNWIND documentation.


Solution

  • For all the 3 queries listed, the key point is at,

    ...
    WITH n, collect(n.name) as nodes
    ...
    

    collect is an aggregating function and it can be grouped. Specifying 'n' in the WITH clause causes it to do "group by" similar to SQL grouping. Therefore if you have 3 nodes, you get 3 results.

    You can debug by RETURNing after the WITH to see the result at each step, like so,

    MATCH (n:test)
    WITH n, collect(n.name) as nodes
    RETURN n, nodes