Search code examples
sqlpostgresqlaggregate-functionswindow-functions

Group users by revenue and years


I have problem grouping users in 'Group_3' in 2 consecutive years. Idea is to group users in group 3 that placed orders that are more then 2000 in revenue in 2 consecutive years.

SELECT email_id,
       COUNT(*) as num_of_orders,
       SUM(revenue) AS total_money_spent,
       CASE WHEN 
                SUM(CASE WHEN order_creation_date BETWEEN '2016-01-01' and '2016-12-   31'
                THEN revenue END) > 2000
       THEN 'Group_1'
            WHEN 
                SUM(CASE WHEN order_creation_date BETWEEN '2015-01-01' and '2016-12-31'
                THEN revenue END) > 2000
       THEN 'Group_2'
            WHEN 
                SUM(CASE WHEN EXTRACT(years FROM order_creation_date) <> LAG(EXTRACT(years FROM order_creation_date)) 
                OVER(PARTITION BY email_id ORDER BY EXTRACT(years FROM order_creation_date)) + 1 
                THEN revenue END) > 2000
                THEN 'Group_3'
            WHEN 
                SUM(CASE WHEN order_creation_date BETWEEN '2015-01-01' and '2016-12-31'
                THEN revenue END) < 2000
       THEN 'Group_4'
            WHEN 
                SUM(CASE WHEN order_creation_date BETWEEN '2015-01-01' and '2016-12-31'
                THEN revenue END) = 0
       THEN 'Group_5'
         END
FROM sql_test
GROUP BY email_id
ORDER BY num_of_orders DESC;

I get:

aggregate function calls cannot contain window function calls
LINE 13: ...E WHEN EXTRACT(years FROM order_creation_date) <> LAG(EXTRAC..

I can't use LAG in CASE query I guess?


Solution

  • Window functions are executed after aggregate functions in a SELECT statement. You can't nest them in reverse order. Consider the sequence of events:

    Use a subquery instead:

    SELECT email_id
         , COUNT(*) as num_of_orders
         , SUM(revenue) AS total_money_spent
         , CASE
              WHEN SUM(revenue) FILTER (WHERE order_creation_date BETWEEN '2016-01-01' AND '2016-12-31') > 2000 THEN 'Group_1'
              WHEN SUM(revenue) FILTER (WHERE order_creation_date BETWEEN '2015-01-01' AND '2016-12-31') > 2000 THEN 'Group_2'
              WHEN SUM(revenue) FILTER (WHERE EXTRACT(year FROM order_creation_date) <> last_year + 1)   > 2000 THEN 'Group_3'
              WHEN SUM(revenue) FILTER (WHERE order_creation_date BETWEEN '2015-01-01' AND '2016-12-31') < 2000 THEN 'Group_4'
              WHEN SUM(revenue) FILTER (WHERE order_creation_date BETWEEN '2015-01-01' AND '2016-12-31') = 0    THEN 'Group_5'
           END AS my_sum
    FROM  (
       SELECT *, LAG(EXTRACT(year FROM order_creation_date)) OVER (PARTITION BY email_id ORDER BY EXTRACT(year FROM order_creation_date)) AS last_year
       FROM   sql_test
       ) sub
    GROUP  BY email_id
    ORDER  BY num_of_orders DESC;
    

    While being at it: aggregate FILTER expressions are more elegant & faster than those CASE expressions. See: