Search code examples
neo4jcypherunioncorrelated-subquery

Cypher: Correlated variables in Union


I need to answer the following query: Top-three store cities based on store sales and store sales for all the other cities combined.

With the following query

// Top 3 cities
MATCH (t:Store)<-[:HasStore]-(s:Sales)
WITH t.StoreCity AS StoreCity, t, sum(s.StoreSales) AS StoreSales
ORDER BY sum(s.StoreSales) DESC LIMIT 3
// All other cities
WITH StoreCity, StoreSales, collect(t) AS TopThreeCities
MATCH (t1:Store)<-[:HasStore]-(s:Sales)
WHERE NOT(t1 IN TopThreeCities)
// JOIN of the two results -> I would need the UNION
RETURN StoreCity, StoreSales, "Other cities" AS StoreCity1, sum(s.StoreSales) AS StoreSales1

I was able to obtain the following answer

"A" 10  "Other Cities"  50
"B" 9   "Other Cities"  50
"C" 8   "Other Cities"  50

But I would like to obtain

"A" 10
"B" 9
"C" 8
"Other Cities"  50

Any idea how to obtain this ? I have tried many possibilities without any success :-(


Solution

  • Using a UNION:

    // Top 3 cities
    MATCH (t:Store)<-[:HasStore]-(s:Sales)
    RETURN t.StoreCity AS StoreCity, sum(s.StoreSales) AS StoreSales
    ORDER BY sum(s.StoreSales) DESC LIMIT 3
    // All other cities
    UNION
    MATCH (t:Store)<-[:HasStore]-(s:Sales)
    WITH t.StoreCity, sum(s.StoreSales) AS AllStoreSales
    ORDER BY AllStoreSales DESC SKIP 3
    RETURN "Other cities" AS StoreCity, sum(AllStoreSales) AS StoreSales
    

    The UNION runs two distinct queries, so before the UNION we find the top three selling cities and return their names and counts, as you had originally

    After the UNION, the same basic query is run again to find the sum of sales by city, ordered by descending sales, then the top three results are skipped. The remaining city's sales are summed and returned as Other cities.

    The two parts of the UNION are completely separate, but both queries have to return the same number of columns with the same column names and in the same column order.