Search code examples
sqlpostgresqlaggregate

How to aggregate using distinct values across two columns?


I have the following data in an orders table:

revenue       expenses        location_1      location_2
3             6               London          New York
6             11              Paris           Toronto
1             8               Houston         Sydney
1             4               Chicago         Los Angeles
2             5               New York        London
7             11              New York        Boston
4             6               Toronto         Paris
5             11              Toronto         New York
1             2               Los Angeles     London
0             0               Mexico City     London

I would like to create a result set that has 3 columns:

  1. a list of the 10 DISTINCT city names
  2. the sum of revenue for each city
  3. the sum of expenses for each city

The desired result is:

location         revenue    expenses
London           6          13
New York         17         33
Paris            10         17
Toronto          15         28
Houston          1          8
Sydney           1          8
Chicago          1          4
Los Angeles      2          6
Boston           7          11
Mexico City      0          0

Is it possible to aggregate on distinct values across two columns? If yes, how would I do it?

Here is a fiddle:
http://sqlfiddle.com/#!9/0b1105/1


Solution

  • Shorter (and often faster):

    SELECT location, sum(revenue) AS rev, sum(expenses) AS exp
    FROM  (
       SELECT location_1 AS location, revenue, expenses FROM orders
       UNION ALL
       SELECT location_2            , revenue, expenses FROM orders
       ) sub
    GROUP  BY 1;
    

    May be faster:

    WITH cte AS (
       SELECT location_1, location_2, revenue AS rev, expenses AS exp
       FROM orders
       )
    SELECT location, sum(rev) AS rev, sum(exp) AS exp
    FROM  (
       SELECT location_1 AS location, rev, exp FROM cte
       UNION ALL
       SELECT location_2            , rev, exp FROM cte
       ) sub
    GROUP  BY 1;
    

    The (materialized!) CTE adds overhead, which may outweigh the benefit. Depends on many factors like total table size, available indexes, possible bloat, available RAM, storage speed, Postgres version, ...

    fiddle